Using a pure bash XML "parser" for jhlfs. A very humble suggestion awaiting for comments.

George Makrydakis gmakmail at gmail.com
Fri Jan 27 03:48:56 PST 2006


Hello Everyone!

I have been using various operating systems in different hardware platforms for various reasons.
I have been using the LFS and BLFS instructions for some years now, but this is my first post.
First of all, I would like to congratulate the LFS developer groups for their excellent work on this project.
To my eyes it provides the single best way for people not only to learn GNU/Linux, but also GNU/Linux deployment
  in a production environment. I have never had an unresolvable issue with LFS / BLFS or my own mix of those two.
Thanking could proceed to evangelism here so I will stop (for now).

Automation is a must if you are to deploy an LFS - based platform.
I have been using some of the methods posted by your developer group, some  others of my own mix too.
jhlfs is a nice project, and I hope that there is always going to be a bash - based method of automation.
Why? Even if bash is slow compared to a structured binary, a structured script can be equally useful when building a small core for zero.

I really like the idea of using the XML sources of the book, and wondered if there was a way to get rid of a third party dependence,
i.e. the XML parser binary (whatever that may be). After a search over the internet, I found out that even if there is a bash "script"
parsing a XML document, that is always done using a third party application / utility.
The following is an attempt to get rid of such a dependency, for only pure bash is used.
I have some other scripts that I could merge with it. It does not support comments / preprocessing stuff,
it will work with anything of the following form:

####### starts here ########
<xmlroot>
	<hello attribute="323"> LFS "owns"! </hello>
<another stuff="34243"
	hellomate="somestuffheretoo"
/>
</xmlroot>

####### ends here ##########

As long as your document is like something of the above, it will work. It outputs the result to two different arrays:
one for keeping status, the other for keeping data. There is no validation (this is experimental...) so do provide valid documents.
Also remember that attributes and their values must be fully declared in a single line (ie do NOT span quotes between lines).
For testing I have been using rather nasty XML syntax mixes, and for now it worked. I am waiting for your comments if you care  to
give this a try.

Thank you.

-----------------------------------------------------CUT STARTS HERE--------------------------------------------------
#!/bin/bash
	# project	:	a possible add-in for jhlfs?
	# name		:	bparser
	# version	:	0.7e (JAN2606)
	# author	:	George Makrydakis <gmakmail at gmail.com>
	# license	:	GPL v 2.x or up
	# info		:	bash - based XML pseudoparser for the jhLFS Project
	# status	:	any valid XML document can be parsed by this script, just substitute the myfile variable
	
	declare -a xmlSTAT				# contains the status informer
	declare -a xmlDATA				# contains any relevant data (element/attributes name/values and unparsed data)
	declare -i xINDEX=0				# global xINDEX pointer for xml**** arrays
	declare -i checkpoint=0				# a character counter & pointer for parserLINE
	declare -i doall=0				# a simple counter variable
	declare -r myfile="recent.xml"	# the filename / path of the file to "parse"

	declare -r startTAG='<'			# literal <
	declare -r closeTAG='>'			# literal >
	declare -r slashTAG='/'			# literal /
	declare -r equalTAG='='			# literal =
	declare -r quoteTAG='"'			# literal "
	declare -r whiteTAG=' '			# literal
	declare elmentATTN=""			# element attribute name
	declare elmentATTV=""			# element attribute value
	declare elmentNAME=""			# element name value
	declare elmentDUMP=""			# element name dump
	
	declare parserLINE=""			# contains a single line read from the XML document sent to the parser
	declare parserFLAG="ENABLED"		# can have two mutually exclusive values: ENABLED / DISABLD
	declare parserFILE=""			# file sent to the parser
	declare parserBUFF=""			# parser buffer variable
	
	while read parserLINE
	do
		parserBUFF=""
		for ((checkpoint=0; checkpoint < ${#parserLINE}; checkpoint++)) ;
		do
			case ${parserLINE:$checkpoint:1} in
			$startTAG)
				let "checkpoint++"; elmentNAME=""; parserFLAG="ENABLED"
				if [ "$parserBUFF" != '' ] ; then
						xmlSTAT[xINDEX]="#"
						xmlDATA[xINDEX]="$parserBUFF"
						let "xINDEX++"
						parserBUFF=""
				fi
				until [ "${parserLINE:$checkpoint:1}" = ' ' ] || \
					  [ "${parserLINE:$checkpoint:1}" = '>' ] || \
					  [ $checkpoint = ${#parserLINE} ];
				do
					elmentNAME=$elmentNAME${parserLINE:$checkpoint:1}
					let "checkpoint++"
				done
				if [ "${elmentNAME:0:1}" = "$slashTAG" ] ; then
							xmlDATA[xINDEX]="${elmentNAME#*/}"
							xmlSTAT[xINDEX]="$closeTAG"
							let "xINDEX++"
							parserBUFF=""
				else
					xmlDATA[xINDEX]="$elmentNAME"
					xmlSTAT[xINDEX]="$startTAG"
					let "xINDEX++"; parserBUFF=""
					case ${parserLINE:$checkpoint:1} in
						$whiteTAG)	elmentDUMP="$elmentNAME" ;;
						'')			elmentDUMP="$elmentNAME" ;;
					esac
				fi
			;;
			$slashTAG)
				case $parserFLAG in
				ENABLED)
					if [ "${parserLINE:$((checkpoint + 1)):1}" = "$closeTAG" ] ; then
						xmlSTAT[xINDEX]="$closeTAG"
						xmlDATA[xINDEX]="$elmentDUMP"
						let "xINDEX++"
						elmentDUMP=""; parserBUFF=""; let "checkpoint+=2"
					fi
				;;
				DISABLD)
				;;
				esac
			;;
			$quoteTAG)
				case $parserFLAG in
					ENABLED)
						let "checkpoint++"; elmentATTV=""; parserBUFF=""
						until [ "${parserLINE:$checkpoint:1}" = '"' ] ;
							do
								elmentATTV=$elmentATTV${parserLINE:$checkpoint:1}
								let "checkpoint++"
							done
						let "checkpoint++"; xmlDATA[xINDEX]="$elmentATTV"; let "xINDEX++"
						case ${parserLINE:$checkpoint:1} in
							$closeTAG) if [ "${parserLINE:$((checkpoint + 1)):1}" = '<' ] ; then let "checkpoint-=1"; fi ;;
							$slashTAG) let "checkpoint-=1" ;;
						esac
					;;
					DISABLD)
					;;
				esac		
			;;
			$equalTAG)
				case $parserFLAG in
					ENABLED)
						elmentATTN=""
						if [ ${parserLINE:$((checkpoint + 1)):1} = '"' ] ; then
							elmentATTN=${parserBUFF##$equalTAG}
							elmentATTN=${elmentATTN##*$whiteTAG}
							xmlSTAT[xINDEX]="$elmentATTN"
						fi
					;;
					DISABLD)
					;;
				esac
			;;
			esac
			
			case $((checkpoint + 1)) in
				${#parserLINE})
					parserBUFF=$parserBUFF${parserLINE:$checkpoint:1}
					if [ "${parserLINE:$checkpoint:1}" != "$closeTAG" ] ; then
						xmlSTAT[xINDEX]="#"
						xmlDATA[xINDEX]="$parserBUFF"
						let "xINDEX++"
					fi
					;;
					*)
						case ${parserLINE:$checkpoint:1} in
							$closeTAG)	parserBUFF=""; parserFLAG="DISABLD";;
							*)			parserBUFF=$parserBUFF${parserLINE:$checkpoint:1};;
						esac
					;;
			esac
		done
		
	done < "$myfile"
	
	#########################################################################################
	
	function helpmeout ()
	{
		clear
		for (( doall=0; doall < $xINDEX; doall++ )) ;
		do
			echo -e "${xmlSTAT[doall]}"":""${xmlDATA[doall]}"
		done
	}
	
	helpmeout
--------------------------------------------------------------------------- CUT ENDS HERE----------------------------------------------------




More information about the alfs-discuss mailing list