XML Parsing

Matthew Burgess matthew at linuxfromscratch.org
Sat Feb 19 13:18:56 PST 2005


Folks,

The SRS 
(http://linuxfromscratch.org/~matthew/alfs-srs/alfs-srs.html#ch-functions-validation) 
  differentiates between profile validation and profile processing. 
Here's what I think we require for each from an XML library:

1) Validator:

    * SAX based (we don't need to keep an in-memory copy of the
      profiles).
    * Supports XIncludes
    * Validating parser (duh!)

2) Processor:
    * DOM based (we want to keep an in-memory copy of the profiles so we
      can process them efficiently)
    * Supports XIncludes
    * Non-Validating parser (validation has aleady been performed by the
      time the processor runs)

If we're to use Python, this causes us a minor inconvenience :)  It 
appears as if Python only comes with support for the expat parser out of 
the box, and expat is a non-validating parser.

http://www.xml.com/pub/a/2004/10/13/py-xml.html contains a list of 
Python XML parsers, from which only 2 (cDomlette and libxml2) appear to 
provide the features we need.  Out of those, I'd be prone to choose 
libxml2 because a) It's in BLFS b) I've heard of it :) c) We already use 
it for processing the various LFS books d) I've had good experience 
reporting bugs and having them resolved quickly.

So, does anyone have any information contrary to what I've stated above 
(i.e. there is a validating XML processor which supports XIncludes, DOM 
and SAX provided by the stock Python tarball)?  Or does anyone mind if 
we retain libxml2 as a dependency for alfs?

Regards,

Matt.



More information about the alfs-discuss mailing list