Thank you for making it meaningless...

Bruce Dubbs bdubbs at swbell.net
Thu Mar 9 10:25:06 PST 2006


Jean Charles Passard wrote:
> I'm truying to go deeper in Xml analysing but I'm really annoying by
> what I read in specification w3c.
> Especially about your point 7.
>    I have noted this delimiters :
>       1. < >
>       2. <!-- -->
>       3. <? ?>
>       4. <![CDATA[  ]]>
>       5. <!DOCTYPE  >
>       6. <! >
> 
>    They all give problems if I try to parse only on <> :
>       1. it's ok ;)
>       2. can have < and/or > inside
>       3. it's ok too.
>       4. can have < and/or > inside
>       5. can have <!-- --> <! > and []
>       6. it's ok
> 
>    I can't see what idea can make a good parse whitout doing it char by
> char.

Of course you have to parse the input character by character.  The way
to do this is with a state machine.  When you get a '<' character you go
into an intermediate state.  You then read the next character to decide
what state to go into next.

I wrote a program once to count lines of C/C++ code that is not unlike
this problem.  When you take the issue of comments in C/C++ as well as
directives and tokens, its quite similar to the XML problem.  I am
attaching the code as an example.

You can build the program with a simple: gcc -o count count-methods.c
To test, use: ./count -m count-meth*{c,h}.
The code is reasonably well commented.  :)

I also wrote a more sophisticated program to count the number of comment
words, variables, etc and the frequency of use, but I can't find it
right now.

  -- Bruce
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: count-methods.c
URL: <http://lists.linuxfromscratch.org/pipermail/alfs-discuss/attachments/20060309/4c52e8ac/attachment.c>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: count-methods.h
URL: <http://lists.linuxfromscratch.org/pipermail/alfs-discuss/attachments/20060309/4c52e8ac/attachment.h>


More information about the alfs-discuss mailing list