XML

Hui Zhou zhouhui at wam.umd.edu
Wed Feb 2 12:15:27 PST 2005


On Wed, Feb 02, 2005 at 07:44:51PM +0000, Matthew Burgess wrote:
>Hui Zhou wrote:
>>On Wed, Feb 02, 2005 at 08:21:40AM -0700, Kevin P. Fleming wrote:
>>
>>>Gerard Beekmans wrote:
>>>
>>>>Maybe you can look at it this way: The profile is source code. The
>>>>validation process is the compiler. Then it is accepted and now a proper
>>>>executable piece of code.
>>>
>>>
>>>Very good example Gerard! And it follows that while the compiler may 
>>>very well compile your source code and produce an executable binary, 
>>>that binary may not at all do what you expected, due to semantic 
>>>errors in the source code that the compiler could not find for you. 
>>>This is the identical situation that we are talking about with 
>>>validating XML, but there's no reason to stop doing it just because it 
>>>can't find _all_ errors. 
>>
>>
>>We are talking about a validating process before actual parsing.
>>The compiler parses without using another process to validate.
>
>And how do you think the validator (xmllint) manages to validate the XML 
>if it doesn't also parse it?

It is one extra parse that not useful at all. After xmllint, do you 
think the alfs tool could spend any less time to parse the profile 
on running?
>
>Again, this is a case of us not wanting to do any more work than is 
>absolutely necessary.  If the server doesn't receive a completely valid 
>profile then it has to bail out (much like the compiler has to bail our 
>if you give it any invalid code) - it can't possibly start processing 
>any commands it may have successfully parsed if the whole profile hasn't 
>been inspected.
>
>So it comes down to these 2 situations:
>
>1) Using SAX we could validate and process the XML in one pass. 
>However, if we're handed an invalid profile, we've just wasted time 
>processing[1] whatever sensible XML we did find.
>2) Using 2 passes, we could firstly validate the XML before doing any 
>other processing on it.  We have to parse the profile twice, but won't 
>end up needlessly processing any invalid XML.
>
>Note that 'XML' in the above 2 situations can be replaced with any data 
>format.  The advantages of XML are that it saves us having to define our 
>own data format and thus write parsers/validators for it.

Why do you worry about feeding the alfs tool an invalid profile and 
make it bail out? alfs bail out mean a bad profile and you edit and 
try again. 

The whole alfs parser with KISS principle probably will have less foot 
print than xmllint. 

Thinking of compiling c code, do you use an external parser to 
validate the source at syntax level before feeding to gcc?

A robust parser should be able to accept any random streams without 
crashing or affect the system. That is the basic security requirement 
of any software. With simple parser, this can be easily audited and 
make sure. With libxml2, lets hope Daniel Villard done his work right. 
(Well, he used the same method (random call) to detect the 
vulneralbility). DV doesn't have choice, we have.

-- 
Hui Zhou



More information about the alfs-discuss mailing list