Solprovider Lenya Kongregate Registration for Free Flash Games and Chat with solprovider

Developer Mailing List 3

Another temporary page for my writings to the Dev ML on the same thread for #2.

(I use "Image" for something graphical that is seen, and "Graphic" or "Graphic File" for something that is created and manipulated. My brain is much too anal-ytical. Most people use them interchangeably, and prefer using Image for both of my definitions.)

Asset does not mean "all graphics". An "Asset" is any content information that is not XML. Images are one type of Asset.

The General Usage definition of Resource is "something usable". Lenya 1.2 defined "Resources" as "Anything usable that does not fit another term." With Modules getting the functional and configuration parts out of the way, all that is left is Content. The most usable units of Content are Documents and Assets. A generic term for both types of data seems good for conversation and programming class naming. Obeying my Rules "Use an exiting word if one exists" and "Use one word instead of two", "Resource" is a good choice. If anyone finds a better word (within the Rules), I will not be upset.

"Sitetree" is defined in Lenya 1.2 as "XML maintaining the relationships of Documents in a hierarchy."
"Structure" is not defined in Lenya yet. General Usage defines
"structure" as "something composed of multiple parts, or their arrangement", which fits how you have been using it. "Sitetree" is the better (more specific) choice when referring to the XML representation of the arrangement of Content in a Publication.
"Index" is the accepted term in many related technologies (computer databases) for "a data structure external to the primary data storage used to organize it." Some platforms use "View" to mean "the resulting data when looking at data sorted and filtered by an Index", but Cocoon redefined "View" to mean a "Breakpoint".

I like keeping structure as a generic term. Content has structure, but so does a Document, Publication, and even the Lenya Server.

"Document" is already well-defined.
"Asset" was defined in previous versions of Lenya.

Using either term for the generic will confuse people. I think having terms that imply the XML vs. Non-XML distinction is important because Lenya is XML-based and must treat XML differently than non-XML. The following snippet does not make sense to me, and should break Cocoon:
<map:generate src="image.gif"/>
<map:transform src="mytransform.xsl">
<map:serialize type="binary"/>

We are moving Assets to the top-level of Content as siblings to Documents. (If we are not, this discussion is almost pointless.) A generic term for all children of Content, both Documents and Assets, would be very useful. "Resource" and "ContentItem" are the popular candidates.

A "Document" is a single piece of textual information, and a "Page" is the single unit displayed to a visitor. If you use "Document" for both definitions, things get very confused. Lenya adds navigation and presentation to Documents to create a Page.

SVG is XML. A unit of SVG data is a Document. The resulting image could be saved as an Asset (losing its dynamic abilities) or saved in a cache by the "svg2image" Module.

"Resource" is "valuable information". All the units under Content are "valuable information". Yes, it is more generic than "Document" and "Asset", which is why "Resource" is a good term as the generic for both.

For Lenya people, there is already a an entry in the brain for "Resource". The definition is rather blurry, but the entry exists. Reusing that entry requires changing the definition. Adding a new term requires creating a new entry, which increases the memory requirements.

Yes, I skipped the alternate definitions of "something that regularly publishes {definition #1}", and "the act of publishing {definition #1}". A Lenya Publication fits those definitions too, which is why "Publication" is so easily understood.

If we define "Resource" as the parent object that maintains the Security, Translations (languages), and Revisions of all objects under Content, then the same object will be used to maintain the Security, Translations, and Revisions of functional resources.

"Areas" are being replaced by "Modules". It will be easy to add a Module to edit anything in the Lenya fileSystem/repository. Security will be very important, and the Resource class will already have proven code from handling Content.

"Document" is the internal term for "a unit of information formatted as XML", or "XML Document", an internal unit of storage. Visitors never see "Documents".

A "Page" is a "response to a request by a visitor formatted as HTML", or "that which is presented to the consumer of the site." Go ask anybody (a parent, a child, the village idiot) what they see on a Website, and they will answer "a Page".

Lenya has never displayed Documents; it has always used Documents as input to create Pages.

Websites use HTML. It is the primary format for web browsing. Most pipelines finish with <map:serialize type="html">. It is so common the type="html" is not necessary because it is the default. I sometimes serialize as XML for testing, but I do not want visitors to see it.

The "rss" Module would serialize as XML, but RSS uses "Feeds" containing "Channels" containing "Items", as well as referring to the response as a "Document" using the pure XML definition.

Documents are internal units of XML data. They are separated by purpose, type, and security.

Subject: one Document contains the list of Contributors, another how to install Lenya. Different subject, different document.

Type: Most documents are XHTML, basic word processed content. A "product" document contains different (and more rigid) fields. Different DTD, different document.

Security: One document can be edited by any editor. "Product" Documents can only be edited by the inventory maintainer. Different security requirements, different document.

When creating a Page, Lenya can aggregate introduction text from one Document, a list of products from other Documents, navigation based on the Publication, and other presentation information. A Document is a possible input for creating a Page. A Page can be created from many Resources, including Documents, Assets, and Modules. (I have a few "You cannot do that" Pages that are just HTML files piped to the visitor. Responding to the request does not use any Documents or special processing.)

- when a unit of information formatted as XML contains other units of
information formatted as XML, the result is a unit of information
formatted as XML.
or phrase it:
- when a Document contains other Documents, the result is a Document.

For example, a NavigationModule would aggregate many Documents, filter the data, and serialize as XML. The Live Module handles the result from a NavigationModule just like a Document retrieved from the datastore. The aggregation of a Document from the datastore with the results of several NavigationModules creates a new Document. That is passed to a Transformer to create an XHTML Document, which is passed to a Serializer to create the HTML Page (which may not be valid XML and so should not be called a Document) which is returned to the visitor.

For programming purposes, "Document" always refers to an "XML Document". Some are stored in Content. Others are created by Modules, and only exist in memory. But all can be handled with the same functions.

I have never referred to an "HTML Document". HTML has Pages. XML has Documents. RSS and OOo are XML-based software, so they use "Documents". PDF and MSWord refer to "Word Processing Document", which is not necessarily XML, although both formats can be converted to XML.

Documents are limited to XML because XML has "XML Documents", Lenya and Cocoon are XML-based, and that is how "Document" was defined for Lenya 1.2.

How could Lenya change to work with non-XML data? I am not certain it is possible. Much of Lenya's functionality is the merging and filtering of data. There must be a common format for the data. XML is the current (and probably best) format. Let us pretend there is a Module that coverts an Image to XML. What could we do with it? Converting a PDF or MSWord DOC to XML produces something usable, but I doubt Lenya should do anything before they are converted to XML. Everything must be converted to XML, or treated as an uploadable/downloadable/unchangeable-within-Lenya Asset.

A Resource (Document, Asset, and even Code) should be able to configure its own security. Yes, security must be handled by the platform, but each Resource should be able to configure who can read it, who can edit it, and sometimes who can or is required to approve or publish it. Most of the time that list is inherited from the Parent or a default, but each Resource should be able to change it.

An Image, such as the company logo, can be read by everybody, edited by the graphics designer, and approved/published by senior management. A Document, such as the draft of a financial report, may be seen by accounting and senior management, edited by accounting, must be approved by 2 senior managers, and published by the CFO. Workflow handles most of it, but Security verifies what is allowed. The janitor cannot read the financial report until publishing (Workflow) changes the security of the Document so everybody can read it.

In the print industry, Pages are sections of a Document. And in one of my posts, I suggested a Module that returns a section of a Document.

Documents (as XML) are units of data. Pages are units of display. Whether a Page is built from zero, one, or more Documents depends on the requirements. Whether a Page uses the entire Document depends on the requirements. The "menu" and "rss" Modules use a very small portion of many Resources to create one Document. The "page" Module returns a section of a Document. The "live" Module aggregates them and returns an HTML Page.

In one of today's posts, I implied that I do not want it named "Content" because I want the class to be usable for development Resources, such as CSS, XMAPs, XSLTs, and XSPs. The Security layer will be very important to restrict access to the developers. The Translation layer is easily disabled (just have only one Translation and set it as the default), but might be useful for CSS and sometimes XSLTs (different XSL used for LTR and RTL languages). The Revision layer would be useful for anything editable. (Would it be good to be able to rollback an XMAP?)

I am uncertain how my suggested terminology limits Lenya when used in a "multi-format environment". Information in any XML format (XHTML, SVG) is called a "Document", can be edited, and can be manipulated by XSL. Information not in XML format is called an "Asset", is not editable within Lenya, and can be uploaded and downloaded. Both are "Resources" and have Security, Translations, and Revisions: the primary functions of a CMS.

Accepting that Lenya will be used to store content in many formats, how do you propose to add value to non-XML data besides storage and retrieval?

A customer of mine uses Lenya to store PDFs. They complain accessing the information in those PDFs loses the Website's navigation, they cannot provide links to anchors in the PDFs, and (because of the really limited software they use to create PDFs) they cannot add links in the PDFs. We could provide a PDF editor within Lenya that would solve most of the issues, but there is no good reason why the information in those PDFs is not stored in Lenya's standard "xhtml" Documents.

How will Lenya add value to MSWord DOC files? MSExcel XLS files? MSPowerPoint PPTs? GIFs? JPEGs? PNGs? MPEGs? AVIs? WMVs?
Can any of these be edited in Lenya? Or are they just Assets, with upload, download, Security, Translations, and Revisions?

"A piece of information, regardless of its nature, which is handled as a single unit by Lenya"
I vote for Resource as the parent class which is subclassed:
- Document (XML stored in Content and Modules)
- Asset (uploaded file stored in Content and Modules)
- Program (CSS, XMAP, XSLT, and XSP stored in Modules)

We have not discussed a name for "programming resource" yet, but it should be unnecessary because each type will have its own class/Module. XMAP, XSLT, and XSP will inherit from Document since they are XML. CSS will need its own, but Thorsten suggested Forrest has something usable.

<< Notes: Dev ML 2Old: Content Structure >>

Contact Solprovider
Paul Ercolino