Xemeiah : A Short Introduction

Francois Barre

www.xemeiah.org

20100116-1 (Xemeiah 0.5.1)


Table of Contents

Quick startup guide
How to get Xemeiah
Processing XSL Stylesheets
Starting Xemeiah WebServer
Using Xemeiah WebServer as Media Player
Xemeiah Roadmap
Previous releases
Future releases :
Overall Architecture
Introduction
Xemeiah's Kernel : Xem::Store and Xem::Document
The DOM : Document Object Model
XML Processing : Xem::XProcessor
Xem::XProcessor modules : XProcessorModuleForge and XProcessorModule
Extending XML Processing : Libraries and Xem::XProcessorLib
The Persistence Layer
Glossary : XML Terminology

Quick startup guide

This section explains how to get started with Xemeiha, both as a XSLT Processor or as a complete XML Database & WebServer.

Have a look here http://sourceforge.net/projects/xemeiah/ for more information on Xemeiah.

How to get Xemeiah

Debian & Ubuntu Packages

Since 0.5.1, a repository is available for both Debian unstable and Ubuntu karmic :

Add the following line to your /etc/apt/sources.list for Debian :

deb http://xemeiah.sf.net/debian unstable main

or for Ubuntu :

deb http://xemeiah.sf.net/debian karmic main

Then, install one of the package (using apt-get install or your prefered dpkg frontend) :

  • xemeiah-xsl for the XSL-only Xemeiah package

  • xemeiah-webserver for the Xemeiah WebServer suite

  • xemeiah-media-player for the Web-based Xemeiah Media Player

Packages for other distributions or Operating Systems

No other Operating System or Linux Distribution is supported yet, but help is welcome !

RedHat, Windows, *BSD, ...

Compile from the sources

The freshest sources are always retrievable from the SVN repository :

http://xemeiah.svn.sourceforge.net/svnroot/xemeiah

Refer to INSTALL file for further information on compile/install procedure.

Processing XSL Stylesheets

To process an XSL stylesheet, just run :

xem xsl 'stylesheet.xsl' 'file.xml'

xem will output result to stdout, unless a xsl:result-document has been specified in stylesheet.

One can redirect output to a file using :

xem xsl 'stylesheet.xsl' 'file.xml' > result.html

To use EXSLT extensions, add 'exslt' module to parameters :

xem xsl --module exlst 'stylesheet.xsl' 'file.xml'

Starting Xemeiah WebServer

Xemeiah WebServer uses 'persistence' library as command-line handler. As a result, the first argument provided to xem shall be pers or persistence.

Xemeiah WebServer also uses XProcessor libraries : webserver, xemfs, xemprocessor, ...

Format the Persistence File

Run :

xem pers --store='path-to-store' format

If no --store argument is provided, then the default store file used is xem-main.xem.

Configure Services

Services run are set in the procedure file defined in procedure-aliases.xml, for alias webserver.

One can edit his own version of the startup procedure and define it in procedure-aliases.xml.

Run WebServer

Just call :

xem pers --store='path-to-store' webserver

By default, webserver will bind ports to localhost:1789 (bound ports are defined using xem-web:listen instructions in startup procedure).

Using Xemeiah WebServer as Media Player

First connect to http://localhost:1789/browse.

As no collection is set yet, the collection configuration page will be prompted by default. Set path to the Media Collection and start scanning files.

Xemeiah Roadmap

Previous releases

Since the beginning of the project, each minor version series had the following focus :

0.1.x : Document Object Model

Introducing Xemeiah's Document memory model (page-based segmentation, ease of binary serialization, COW mechanism).

DOM implemented as using stack-optimized references to these large page-based.

0.2.x : XPath Expressions & XML Processor

Introducing XPath's binary format (called Xem::XPathSegment & Xem::XpathStep).

Introducing generic XML-based recursive processing, with stacked garbage collection.

0.3.x : Technology preview

Demonstrating the capabilites of Xem::XProcessor, the generic XML processor.

Demonstrating large documents capabilities (> 4Gbytes)

0.4.x : Core XSLT Processing

Future releases :

As of 0.5.1, the following development roadmap is forecasted

0.5.x : Bindings to other languages (finished Q2 2010)

Bindings to other languages are planned :

  • Java : using JNI (Java Native Interface), provide a org.w3c.dom.* implementation named org.xemeiah.dom.*. The best advantage is to benefit from memory optimizations from C++ implementation in Java for large documents.

    org.xemeiah.dom.* may includes :

    • DOM Node, Element, Attribute, ...

    • XPath implementation (standardized as org.w3c.dom.xpath.*)

    • XSL implementation

  • Python bindings : if someone is interested...

0.5.x will also include a large cleanup work in WebServer configuration and bootstrap procedure, including xem-standard directory, for a more flexible modular structuration of the Xem XML code provided for the WebServer to work.

0.6.x : Implement a Security model

A security model must be introduced for two different aspects :

Security in XProcessor
Security on DOM

0.7.x : Enhance Persistence Layer (finished Q3 2010)

These version series will focus on extending Persistence Layer design & functionalities, including :

  • Crash recovery :

  • Branch Commit & Merge :

  • Xem::NetStore and distributed computing : the Xem::BranchManager model allows extensions to a distributed network of Xemeiah servers, with automatic synchronization between server nodes. A technology preview is forecasted for these releases.

Overall Architecture

Introduction

This section details Xemeiah's main concept and overall internal architecture. This may be a pre-requisite for starting to develop using Xemeiah's framework.

Detailed and up-to-date Xemeiah API documentation may be found here : http://xemeiah.sourceforge.net/doc/html.

Xemeiah's Kernel : Xem::Store and Xem::Document

Xemeiah's fundation class is Xem::Store , responsible which provides access to most of Xemeiah's core functionalities.

Xem::Store is in charge of :

  • providing and bookkeeping lower-level resources, such as low-level memory management,

  • referencing all XML QNames (markup names and namespaces) in use, using the Xem::KeyCache class,

  • referencing and garbage-collecting all instanciated documents, directly or using its dedicated Xem::BranchManager class.

Xem::Document is the class instanciated each time an XML document is opened. This class provides access to the optimized in-memory XML document structure, and provides document-wide functionalities necessary to build up the DOM layer.

The Xem::Document class uses a specialized Xem::DocumentAllocator class for in-document memory management.

The DOM : Document Object Model

Xem::NodeRef : a reference to a DOM node

Global Xemeiah's design is based on the idea that references shall be instanciated on the stack, with a short lifetime, and access to contents pointed by these references shall be handled by a fast low-level layer.

As a result, a Xem::NodeRef contains no information except an internal pointer (the Xem::SegmentPtr), which is used by the Xem::DocumentAllocator layer to fetch real information concerning this node (the ElementSegment structure for an element, the Xem::AttributeSegment structure for an attribute).

This implies that each Xem::NodeRef must be instanciated with a proper reference to a Xem::Document, and this reference shall remain unchanged for the whole Xem::NodeRef lifetime.

Nota : Because attributes are always retrieved from elements, a Xem::AttributeRef reference stores two pointers : one providing access to attribute contents, and one referencing the element which holds this attribute.

Attribute types and Xem::AttributeRef subclasses

Attributes stored inside of Xemeiah are not restricted to base types (String, Number, Integer, QNames, …).

Xemeiah offers the opportunity to build feature-rich typed attributes which can store a large amount of data.

As a result, an element can contain various attributes which share the same name but have different types (see next section on XPath processing for further details).

For example, the Xem::SKMapRef class and subclasses provides access to a Skip-List based implementation of various associative containers, each of which being stored as attributes.

This includes :

  • hash-based maps to other elements (see Xem::ElementMapRef and Xem::ElementMultiMapRef)

  • lists of parsed QNames ( Xem::QNameListRef)

  • Binary Large Object (BLOB) with fast seeking algorithm (Xem::BlobRef)

XPath processing : Xem::XPathParser and Xem::XPath

Xem::XPathParser is the class responsible for parsing, validating and optimizing XPath expressions. Each parsed XPath expression is stored in memory using a Xem::XPathSegment structures.

An XPath expression can eventually (and will generally) be stored as an attribute, on the same element which provided the XPath expression as a string-format attribute.

Let's consider the following example :

<xsl:if test=”count(*) > 3”>
 ...
</xsl:if>

The 'xsl:if' element has a string-typed attribute 'test', which contains the XPath expression. When the XPath class is instanciated for this attribute :

KeyId testKeyId = getKeyCache().getLocalKeyId('test');
Xem::XPath xpathExpression ( xslIfElement, testKeyId );

Xem::XPath constructor will search for an AttributeType_XPath attribute in the xslIfElement named 'test', and will call Xem::XPathParser if it finds none. The Xem::XPathParser will in turn store the parsed format back to the xslIfElement element for future reuse.

XPath evaluation is built upon the following mechanisms :

  • XPath expressions are split into several atomic steps. These steps represent both steps threw the document (such as 'grandfather/father/child/grandchild') and all other kinds of expression components, such as comparators, XPath functions,

  • all environment variables (the $variable resolution) are delegated to the Xem::XProcessor class (see below)

XML Processing : Xem::XProcessor

Xem::XProcessor modules : XProcessorModuleForge and XProcessorModule

Extending XML Processing : Libraries and Xem::XProcessorLib

The Persistence Layer

Glossary : XML Terminology

We assume the audience is familiar with the following XML terminology :

  • DOM : Document Object Model

  • Document :

  • Node :

    • Element :

    • Attribute :

  • Namespaces :

    • Local names :

    • Namespace Prefixes :

    • Fully-qualified names :

  • XPath expression :

  • XPath result :

    • Item, Sequence, NodeSet, ...