Parsing XML files using plain bash (only builtins used) - the x2sh approach

George Makrydakis gmakmail at gmail.com
Mon Feb 13 14:49:35 PST 2006


It _can_ be done. Thank you for posting times. Xpath/Xquery is stack - work for 
me, problem is that I try to find a way to bypass O(n) and O(n^2) phenomena. 
Also in the times listed below take under consideration that a significant O(n) 
phenomenon takes place only once (ie entity dereferencing). After entity 
dereferencing (the one taking place by uncrossing patches.ent and general.ent 
entities), all other xml files can be parsed faster. This is a single hardcoded 
example after all. The fact that bash does this by itself amazes me (lol).

Anyway regarding XPath/Xquery "emulation" i am currently implementing it using a 
"stack" method for navigating through the requests on a large scale basis 
without iterating throughout the LFS book more than once (do that on the fly). 
Problem is to find a hacky "heuristic" way to bypass those nasty big-O phenomena 
that make it impractical. Seems like I am getting there, need more time.

My _main_ scope in writing this was to find a way for bash to read / write in 
xml on its own so when scripts for building ecc are used, they could use this to 
dump / read log in xml format. Then I thought of pussing it a bit further for 
the jhalfs (in either case the original goal I wanted it for is reached).

Do take note that times are extremely load - dependent using bash. I find it 
acceptable if bash say spends a total of less than 15 - 16 mins to parse the 
entire book and dump to a major all-in-one script(s). Do take note that the 
redirection of the output to a file is another problem so limiting number of 
"dumps" to dumping them all at the end or to a single file is much more 
efficient. But after all, it would need to parse it once, once you have the 
script(s) build, you do not need it anymore :).

To conclude, bash is a _very_ inefficient media to work with in either case, but 
pushing it gives more insight :)

Thanks for the time spent responding. Keep up the work with the excellent system 
you guys provide!

George Makrydakis
gmak

M.Canales.es wrote:
> El Lunes, 13 de Febrero de 2006 11:44, George Makrydakis escribió:
>> Some of you may remember a previous post I made regarding this idea: Do not
>> rely on third - party binaries for parsing the XML files, but only on plain
>> bash. Although this may be inefficient because it increases script
>> complexity of a not - optimal language for "programming", a bash script
>> fully self - hosting a jhalfs building method is a nice challenge.
> 
> A very nice work :-)
> 
> That could be a nice feature for jhalfs if the next issues can be solved:
> 
>   - To can drop the libxml/libxslt dependency, the parser should be able to 
> handle all *LFS books. That meant profile support for HLFS and XInclude/XPath 
> support for CLFS. Maybe the profile one could be done, but IMHO the 
> XInclude/XPath one is beyond Bash capabilities.
> 
>  - The parser should to create the output subdirs and numbered scripts. 
> Subdirs are needed to keep separate each build phase, and numbered scripts to 
> can build the packages in their proper order.
> 
>  - Parsing time:
> 
> $ time { ./prototype.sh > glibc.sh; }
> 
> real    0m5.238s
> user    0m4.930s
> sys     0m0.291s
> 
> $ time { xsltproc --nonet --xinclude dump-lfs-scripts.xsl glibc.xml; }
> 
> real    0m0.106s
> user    0m0.096s
> sys     0m0.010s
> 
> 




More information about the alfs-discuss mailing list