x2sh booster script - works only under bash 3.x editions! - interim announcement post

George Makrydakis gmakmail at gmail.com
Fri Feb 17 14:46:27 PST 2006

Dear Friends,

An interim post, of what happens under the hood here...

An operator to be added in the "parsing script" is this: the =~ operator.
The following script uses the =~ operator for evaluating text from the xml files
in raw mode when the <userinput> </userinput> are met as plain strings in a
given xml file. The way the script is written, it could even be possible to 
produce a valid XML file dumped from ALL the xml files of the LFS book if 
extended, and then feed it to the x2sh parser for regular processing, thus 
boosting the performance of the x2sh parser _several_ times.

Of note:

1. I have restructured the x2sh parser script and debugged it, slicing down any
eventual lockups to zero (hopefully). There was a problem with some characters
like '*' once met in an xml file, ending in a corrupt result (_ch5_ glibc for 
example..). This has been taken care of and it now works correctly.

2. I made a script based on the x2sh approach and used it to parse the entire
xml source for the LFS book. I checked the output thoroughly. Globally parsing
the XML sources, without optimization takes at about _30 - 40_ min under normal
working load for my boxes. That is parsing only, with the resulting bash array
exceeding 40000 entries. The resulting array dump-to-disk is a nearly 750 Kb 
file (big).

3. When using the script below to parse the entire chapter collection of the
book in xml, total duration is nearly 40 _sec_ and the result dump to disk
nearly 60kb, _UNPARSED_. With parsing, this "dump", size is even smaller, while
the total time for extracting, dereferencing and redumping to a pure script file
should be that of the time x2sh would take to parse a 50 - 70kb complex valid
and well formed XML (and probably even less, this is a grosso modo calculation
of course!). Also take note that the hardcoded <userinput> </userinput> part can 
  become "uninformed" as well.

4. All of the above lead eventually to a structure that is capable of handling 
the totality of the books in a reasonable amount of time for a pure bash - based 
script, for by lowering array entries and processing time demand, lower 
significantly the effect of O(n) - like phenomena where those are met 
inevitably. Note that from 40 mins to 40 secs for parsing is more than a 10x 
fold decrease in execution time, it is a (60x40) / 40 = _60x!_

The current debugged "uninformed" version of x2sh i have is fused with a script 
just like the one below for the version to be released this weekend! Run the 
script below in the root of your LFS book sources and see what happens :)


George Makrydakis

PS: feedback / testing results are _always_ appreciated. Thank you for hosting 
my posts.

#--------------------------------CUT FROM HERE---------------------------------


# script: XML pseudoparsing booster

declare -a filearray
declare -a filestore
declare -i linecounter
declare -i collectSTART=0
declare -i collectSTOP=0

declare -a chapterlist=(chapter01 \
			chapter02 \
			chapter03 \
			chapter04 \
			chapter05 \
			chapter06 \
			chapter07 \

for selectchapter in ${chapterlist[@]}
	cd $selectchapter
	echo $selectchapter
	for filenameinput in *.xml
		echo "-------------------------------------------------"
		echo "x2sh:parsing file: "$filenameinput
		echo "-------------------------------------------------"	
	while read filearray[linecounter]
		let "linecounter++"
	done < "$filenameinput"

# case scenarios:
# <userinput> ... </userinput> inline definition
# <userinput> spans in more lines
# </userinput> ends spanning

	for ((linecounter=0; linecounter < ${#filearray[@]}; linecounter++));
	if [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
	   [[ "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 1
			printf "%s\n" "${filearray[linecounter]}"
	elif [[ "${filearray[linecounter]}" =~ '<userinput>' ]] && \
	     [[ ! "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 2
			printf "%s\n" "${filearray[linecounter]}"
			let "collectSTART= linecounter + 1"
	elif [[ ! "${filearray[linecounter]}" =~ '<userinput>' ]] && \
	     [[  "${filearray[linecounter]}" =~ '</userinput>' ]] ; #scenario 3
	let "collectSTOP=linecounter"
	let "linecounter++"
	if [ $((collectSTOP - collectSTART)) -ge 1 ] ;
	for ((addthis=$collectSTART; addthis < $collectSTOP; addthis++));
		printf "%s\n" "${filearray[addthis]}"
			printf "%s\n" "${filearray[linecounter]}"
			let "collectSTOP=0"
			let "collectSTART=0"
	cd ..

#--------------------------------ENDS HERE------------------------------------

More information about the alfs-discuss mailing list