[RFC] SRS Section 2

Hui Zhou zhouhui at wam.umd.edu
Thu Feb 3 19:22:38 PST 2005


On Thu, Feb 03, 2005 at 10:33:48PM +0000, Matthew Burgess wrote:
>Going back to your previous example of a C compiler, how 
>useful would it be if you gave the compiler invalid C code and it 
>silently ignored those lines of code?  I'm not talking about code that 
>is syntactically valid but contains logic-errors or other classes of 
>bugs, I'm talking about stuff like a missing ';'.

You are talking about missing endtags or missing '>', right?

That is not XML validation about. That is syntactical parsing to 
verify whether it is a well formed xml document. A XML validation is 
to verify if a well formed XML document is a valid document of certain 
type according to a dtd.

Whether syntactical parsing is useful?

Let's see what is syntactical parsing. With 
	http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
	http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
and lex and yacc, one get a c syntactical parsing. What it does is 
spit OK or NOT on the first syntactic error. 

A syntactic parser verifies a syntactical correct stream, which by 
itself is meaningless. The effort of correcting syntactic error is 
pure wrestling with the language, have no contribution of your 
objective, conveying meanings. 

A seasoned C programmer still make quite some syntactical errors. I 
still routinely commit 20-40% syntactic error, although I can correct 
most of them on detection. But that still indicate the c syntax is not 
quite intuitive or efficient. In python, although with less than a 
year's experience, I only make less than 10% (I estimate much less) 
syntax error. I would consider you(Matt) a seasoned XML speaker. If 
you still commit large percentage of XML syntax error, that's an 
indication a poor syntax, at least to human. When I edit nALFS 
profile, I commit 99% syntax error, so I abandoned nALFS altogether to 
avoid XML. All those syntactical effort are just pure meaningless 
struggle with poor language design. (Well, XML does make a good 
language for machine communication.)

The xml validation in my vision is the semantic layer of XML. The dtd 
defines a document type which has some meaning. However, XML being 
general, this semantic layer can't be very through, so it is at most a 
partial semantic layer. It verifies say it is a valid DOCBOOK 
document, but it doesn't do anything beyond that, it doesn't even tell 
you how many chapters in it and how deep the document structure is. 
How useful is that!

To parse at semantic level, it is necessary to parse at syntactical 
level first. So on validation, xmllint will spit out syntactical 
errors first. The fact is, it has not reached the validation part yet. 
To a nALFS profile, at least in my case, I almost never make dtd level 
error, (how would I forgot to add <url> element under <download> 
element!?) so it is 99% percent pure meaningless syntax error plus 1% 
semantical error which dtd validation won't help. That's why I say dtd 
validation for alfs is pure useless.

After validation with xmllint, does it mean alfs don't need do syntax 
parsing again? NO. In fact, alfs has to parse the full sematical layer 
which must first parse the syntax layer again then parse the dtd level 
semantics, then the actual logic (semantics) of the profile. Dtd 
validation won't save alfs anything! 

By the way, do you ever hit a c parser that tell you your program is 
missing ';' at somewhere? I know some good parser do that, which shows 
that the ; at the end of statement is just not necessary. It is there 
to bait you and let you do meaningless struggle. Yes, if one use those 
Obfuscated c code, almost all those syntax elements are necessary. But 
most programmer use newlines and indentations naturally which are 
ignored in the c syntax, which shows the inefficiency. Python uses 
those, that's why it feel so intuitive and less error prone. Of 
course, the semantical level of C is also quite at cumberson. 

XML also doesn't utilize indentations and newlines and uses double 
matching syntax: '<' and '>'; open tag and close tag. Human is not 
good at matching, and those matching requirement in syntax in my #1 
error source. That is why OGDL (google it if you don't know what it 
is) feels so human friendly. In edit my profile in ogdl, 100% of my 
error is in logical level (like typos in package names).

Let's use the c compiler example again, (seems everyone loves it :), 
there is never a notion of validation. After writing the source, go 
ahead feed into the compiler and correct those errors. What I am 
saying for alfs is: forget all those dtds and validations, just feed 
into alfs and let alfs tell you the errors. alfs need do that whether 
or not the profile has been validated before.

Since I am in the mood and am talking about useless dtd, Jeremy, I 
think your effort on SRS is useless. Disregard the fact that Neocool 
take me as a joke, in many aspects I am in total agreement with him. 
(He seldom speak outside irc, I have little chances to say it.) 
Writing SRS hoping different coder will implement it according to SRS 
is very similar to making dtd and profile and waiting for actual 
program to use it. It will work similar to the dtd, provides more 
hassels than helpful. The dtd demands writing multiple elements for a 
simple bash command, which doubles the opening closing matching pairs 
and again doubles the < > pairs. Any automated building scripts or 
programs writer without the constrain of this dtd will not use a 
profile that does that. As for SRS, without actuall writing the 
program specifing this detail and that detail is absurd. It is 
different when Kevin was in charge, which he has a code base to build 
on and the SRS is essentially a to do list based the existing 
implementation. Now the program is from scratch up, and the SRS has 
little connection with the Neocool's code (if that counts). I just 
can't help thinking what a useless effort you are doing. 

Well, I love discussion and not much in care of the progress of alfs, 
and most of all, I quite appreciate you energy and intention, so I 
haven't speak on this yet. Now I spoken, just as a friend and 
expressing my honest opinion with my reasons, hope you don't get too 
annoyed. :-;

Again on dtd. The dtd added conditionals for a long time, it never 
went into code, how useful is that. the download tag also lagged for 
quite some IIRC. To the profile writers, do you really enjoy reading 
dtd in compare to reading man smb.conf or httpd.conf?

Well, I thought I had more rantings, but I can remember now. Some one 
must be sighing or uhhh... quite a few times already, sorry :) 
Hopefully it still make a less reading than XML spec or even DTD spec.

Sincerely,

-- 
Hui Zhou



More information about the alfs-discuss mailing list