Thank you for making it meaningless...

Jean Charles Passard Jean-Charles.Passard at provectio.org
Thu Mar 9 09:37:27 PST 2006


I'm truying to go deeper in Xml analysing but I'm really annoying by 
what I read in specification w3c.
Especially about your point 7.
    I have noted this delimiters :
       1. < >
       2. <!-- -->
       3. <? ?>
       4. <![CDATA[  ]]>
       5. <!DOCTYPE  >
       6. <! >

    They all give problems if I try to parse only on <> :
       1. it's ok ;)
       2. can have < and/or > inside
       3. it's ok too.
       4. can have < and/or > inside
       5. can have <!-- --> <! > and []
       6. it's ok

    I can't see what idea can make a good parse whitout doing it char by 
char.

JC

George Makrydakis a écrit :
> Nice, some comments:
> 1. using character to character parsing makes it more time - consuming 
> while processing
> 2. i have fixed the current work to parse the DTD elements too 
> (something that is not done by most small C++ xml parsers, they can 
> crash with them).
> 3. if it always expects to find <. or > it can crash or loop forever, 
> even in valid documents.
> 4. I do not like loading the entire file in memory too. Infact i want 
> a sax  - like parser, loading it for now is just for testing purposes.
> 5. Right now I am concentrated on minimizing loop demands.
> 6. There is the possibility that we end up having a validator part 
> within the parser too :)
> 7. Remember that the syntactically important elements within XML are 
> ONLY the <,> characters, slashes come as second priority AFTER you 
> have created a formatted XML string.
> 8. When handling memory aspects in C++, since it offers some "garbage" 
> collection (of some sort) it is very useful to use 
> constructor/destructor stuff.
> 9. I very much like the simplicity in your code; less lines mean less 
> debugging, and mostly that things are done right.
>
> You will have my version ready for prime time soon.
>
> More to come soon...
>
> Thank you all
>
> George Makrydakis
>
> gmak
>
> Jean Charles Passard wrote:
>> Here is a try on the same idea (<> and ><)
>> But I have prefered not to load the complete file in memory.
>>
>> It's my really first try in c++, then I have surely not use really 
>> well the objects.
>>
>> off course, I suppose there is no syntax errors in xml, then I do not 
>> make controls.
>>
>> JC Passard
>>
>> --------------------------------------------------------------------------- 
>>
>>
>> #include <iostream>
>> #include <istream>
>> #include <fstream>
>>
>> using namespace std;
>>
>> int decode_stream (istream& is, string& cdata) {
>>        char c;
>>        static int is_open = 0;
>>
>>        is.get(c);
>>
>>        while (is.good()) {
>>                if (c == '<')  {
>>                        if (!is_open) break;
>>                        is_open ++;
>>                }
>>                if (c == '>') {
>>                        is_open --;
>>                        if (!is_open) break;
>>                }
>>                cdata += c;
>>                is.get(c);
>>        }
>>        if (!is.good()) return 0;
>>        if (c == '<') {
>>                is_open ++;
>>                return 1;
>>        }
>>        return 2;
>> }
>>
>> int ismisc (string& cdata) {
>>        if (!isalpha (cdata[0]) && cdata[0] != '/') return 1;
>>        return 0;
>> }
>>
>> int istag (string &cdata) {
>>        if (isalpha (cdata[0]) || cdata[0] == '/') return 1;
>>        return 0;
>> }
>>
>> int analyze_outside (string& cdata) {
>>        cout << "Outside : " << cdata << endl;
>>        return 0;
>> }
>>
>> int analyze_inside (string& cdata) {
>>        if (ismisc (cdata)) {
>>
>>                cout << "Misc Data : " << endl;
>>                cout << cdata << endl << endl;
>>                return 0;
>>        }
>>        if (istag (cdata)) {
>>                cout << "Tag data : " << endl;
>>                cout << cdata << endl << endl;
>>                return 0;
>>        }
>> }
>>
>> int analyze_stream (istream& is) {
>>        string cdata;
>>        int find_it;
>>
>>        while (find_it = decode_stream (is, cdata)) {
>>                if (find_it == 1) analyze_outside (cdata); // find_it <
>>                if (find_it == 2) analyze_inside (cdata); // find_it >
>>                cdata.clear();
>>        }
>>        return 0;
>> }
>>
>>
>> int main () {
>>        fstream file;
>>
>>        file.open ("test.xml");
>>
>>        analyze_stream (file);
>>
>>        file.close ();
>>
>>        return 0;
>> }
>> --------------------------------------------------------------------------- 
>>
>> George Makrydakis a écrit :
>>>     No misunderstandings please...  This is what I was working on:
>>>     The only bug to fix has to do with DTD (minor one but it crashes 
>>> it...)
>>>     Working together means that I must do marathon running?
>>>     Geez..., do not mix premature constructive criticism with the 
>>> need to not be
>>>     releasing buggy stuff..
>>>     The code below works if you take out DTD elements out of any xml 
>>> file that is VALID.
>>>     Handles the <,> and >,< pairs correctly no matter how weird the 
>>> syntax is...
>>>
>>>     IT IS BUGGY BUT IT IS UNINFORMED, and most of all *SMALL*
>>>     Thank you for making my trouble worth nothing, you could not 
>>> wait a couple of days more, could you...
>>>    
>>> ----------------------------------------CUT--------------------------------------------- 
>>>
>>>
>>>
>>>
>>>     #include <fstream>
>>>     #include <iostream>
>>>     #include <string>
>>>     #include <vector>
>>>     #include <cstdio>
>>>
>>>     using namespace std;
>>>         int main ()
>>>     {
>>>            string linebuffer;
>>>     int lnct = 0;
>>>     vector<string> myvector;
>>>     vector<string> processing;
>>>     string testing;
>>>     string grabITEM;
>>>     myvector.clear();
>>>     ifstream myfile("coreutils.xml"); // take out DTD stuff please...
>>>
>>>
>>>     if ( myfile.is_open() )
>>>     {
>>>         while (getline(myfile,linebuffer,'\n'))
>>>         {
>>>             myvector.push_back(linebuffer);
>>>         }
>>>         myfile.close();
>>>         linebuffer.clear();
>>>     }
>>>     else
>>>     {
>>>         cout << "file not found!" << endl;
>>>     }
>>>         for (lnct = 0; lnct < myvector.size(); lnct++)
>>>         {
>>>             testing = myvector.at(lnct);
>>>             while ( !testing.empty() )
>>>             {
>>>                 if (!linebuffer.empty()) { testing = linebuffer + " 
>>> " + testing; }
>>>                 linebuffer.clear();
>>>                 int stopTAG = testing.find_first_of(">");
>>>                 int openTAG = testing.find_first_of("<");
>>>
>>>                 if ( ( openTAG == string::npos ) || ( stopTAG == 
>>> string::npos ) )
>>>                 {
>>>                     if (( openTAG == string::npos ) && ( stopTAG == 
>>> string::npos ))
>>>                     {
>>>                         cout << testing << endl;
>>>                         testing.clear();
>>>                         break;
>>>                     }
>>>                     else if (( openTAG != string::npos ) && ( 
>>> stopTAG == string::npos ))
>>>                     {
>>>                         linebuffer = testing.substr(openTAG);
>>>                         cout << testing.substr(0, openTAG) << endl;
>>>                         testing.clear();
>>>                         break;                       
>>>                     }
>>>                 }
>>>                 cout << testing.substr(0, openTAG) << endl;
>>>                 grabITEM = testing.substr(openTAG, stopTAG + 1 - 
>>> openTAG);
>>>                 cout << grabITEM << endl;
>>>                 testing = testing.substr(stopTAG + 1);
>>>             }
>>>         }
>>>     myvector.clear();
>>>     return 0;
>>>     }
>>> ------------------------------------- CUT 
>>> ---------------------------------------------------
>>>
>>
>




More information about the alfs-discuss mailing list