perl status update

Jesse Tie-Ten-Quee highos at linuxfromscratch.org
Thu Dec 20 21:53:39 PST 2001


Yo,

[NOTE: Please check out the reference URL's i have at the bottom]

It's great to see so many ppl are still interested in ALFS; I've been
spying on you guys every chance i could get, some things that have been
done are awesome. (*patpats Neven* great work!)

Allthough i've noticed a number of technical problems with some of the
choices that have been made, you guys are doing a good job either way
and working on fixing some of those issues.. i'll not get into that thou
tonight, i don't have the time to explain everything i mean by that... i
wish i did, but nothing i can really do untill January comes around =/)

I've just written this down fairly quickly, it's a bunch of cut and
pasting from my head, so if anything doesn't make sense or is incorrect
please point it out.  I figured i should post this thou, considering
some of the things i've been hearing on IRC and reading on this list,
either way.. enjoy =)

Anyways...

On Wed, Dec 19, 2001 at 11:48:50PM +0000, Mark Ellis wrote:
> I've glanced at SAX, got intrigued, decided i didn't have time to learn 
> it. Might be worth another look later on though.

In all general terms, SAX is the best solution for writting ALFS
implementations.  There are basiclly two ways to parse a file under
XML: Events and Trees.

[stolen from="http://www.saxproject.org/?selected=event"]

Tree-based APIs
    These map an XML document into an internal tree structure, then
allow an application to navigate that tree. The Document Object Model
(DOM) working group at the World-Wide Web Consortium (W3C) maintains a
recommended tree-based API for XML and HTML documents, and there are
many such APIs from other sources.

Event-based APIs
    An event-based API, on the other hand, reports parsing events (such
as the start and end of elements) directly to the application through
callbacks, and does not usually build an internal tree. The application
implements handlers to deal with the different events, much like
handling events in a graphical user interface. SAX the best known
example of is such an API. 

[/stolen]

DOM, which is the official W3C recommendation is the easist for
interacting with your document. (interaction as-in, changing and
editting it, being able to navigate the tree multiple times, etc)

SAX, which was developed around the xml-dev mailing list by the
community to create a 'simpler' approach to parsing a document.  If all
you want todo is responds to what your document contains, such as
we have been doing with ALFS then SAX is perfect.

Don't discredit either API, both are used and needed for different
things.  SAX being the simplest way to write a parser is the most
efficient and fastest, allthough the down side to this is that it
generally requires more work to implement for the application developer.
DOM on the other hand, is slow and a memory hog, but in certain cases
far easier to implement in an application.

"From a certain point if view", anyways =)

-

I've spend a fair amount of time looking and experiment with the
different XML parser's in use today, especially those for the C
language;

Expat, which is the most prominent parser has a number of advantages,
the primary one being it's so small.  It uses the SAX API and is
licensed under the MIT.  Originally developed by James Clark then handed
over to Clark Cooper (and other hackers).  Featuring support for the XML
and namespace standards and is a well-formed parser.

[For those that don't know, nALFS is based off expat, as are the perl
implementations.  However the perl implementations didn't use a straigh
SAX API, they used more of a tree one with XML::Twig. (even thou under
the hood it was based on expat as well, XML::Twig->XML::Parser->expat)]

Libxml2, which is the other most prominent parser is a much nicer
package compared to expat in a few ways, allthough doesn't have some of
the advantages expat does.  Originally developed by Daniel Veillard in
his spare time while he was working at the W3C, it has now grown to a
huge project with multiple subprojects.  It is dual-licensed under the
W3C IPR and LGPL. (allthough i do believe it started under the Apache
license, way back when)  Has support for SAX and DOM out of the box,
allthough the DOM support is kinda slack, which is why there's a seperate
project, gdome2 which is a full-on DOM implementation, there is
also a seperate project for full XSLT support, libxslt.  Features support
for the XML, namespace, XML Base, XPath, XPointer, XInclude, Catalog,
threading, etc (bunch of other standards) and is a validating parser.

[There are a bunch other C XML parsers i've looked at, allthough only really
one that deserves noting.]

RXP, which doesn't use any of the "standard" ways of parsing a doc and
uses instead it's own "xBit" implementation.  Developed at the
University of Edinburgh for the LT XML toolkit project, licensed under
the GPL and is a validaty parser.

Out of these three, i much prefer libxml2 under C.  I've used it before
and really like how it supports alot of the XML extensions. (XInclude
and XSLT are two extensions we could end up using alot off in the
future)

This is what my research and experience is telling me, at least when it
comes to the C language.

-

Here's some extra timbits...

Jason, may i make a suggestion? You keep mention IBM's XML4C, ditch it
and switch to it's successor, The Apache XML Project.  They took the old
code and turned it into a beauty with Xerces (XML) and Xalan (XSLT) for
C++ and Java.

For those hacking under Perl.. I would seriously suggest you ditch
XML::Twig and find a much nicer option.  At the time we used it, it was
the best we could find, but it's been a long time since then, and there
are much nicer options (or so my perl buddies tell me :) under Perl.

Not to mention the fact that all the Perl implementations were done more
as an experiment then as something to be used under a production
environment. (allthough that hasn't stopped most of us!)


 (( Resources ))

XML: http://www.w3.org/TR/REC-xml
XSLT: http://www.w3.org/TR/xslt20/
XPath: http://www.w3.org/TR/xpath20/
XPointer: http://www.w3.org/TR/xptr/
XInclude: http://www.w3.org/TR/xinclude/
etc..

SAX: http://www.saxproject.org/
DOM: http://www.w3.org/DOM/
Expat: http://expat.sourceforge.net/
libxml2: http://xmlsoft.org/
libxslt: http://xmlsoft.org/XSLT/
gdome2: http://www.cs.unibo.it/~casarini/gdome2/
RXP: http://www.cogsci.ed.ac.uk/~richard/rxp.html
Apache XML: http://xml.apache.org

[I could post a pile of URL's, but these are the ones that i've mostly
been touching on, wait untill my next post! ;P]

-- 
Jesse Tie-Ten-Quee - highos at linuxfromscratch dot org
-- 
Unsubscribe: send email to listar at linuxfromscratch.org
and put 'unsubscribe alfs-discuss' in the subject header of the message



More information about the alfs-discuss mailing list