XML + C++ = pain/headache/distasteful/unenjoyable/annoying

beyonddc

Senior member
May 17, 2001
910
0
76
XML + C++ = pain/headache/distasteful/unenjoyable/annoying and so on...

I'm using Xerces-C++ as my C++ parser, and using DOM (Document Object Model) to parse my XML file.

It takes about 10 sloc to 20 sloc with my custom reusable functions just to get a piece of the data from the DOM element tree.

It's painful!!

Also the XML schema that I'm working on is fairly complicated. There's about 500 fields defined, some are string, some are long, some are sequence of string, some are booleans, and some are enumeration.

All right, done with my complain. If anyone has any information on how to efficiently parse the DOM tree, please let me know.

I don't think I did it incorrectly, but maybe there's better way.
 

itachi

Senior member
Aug 17, 2004
390
0
0
is DOM really necessary? if you don't need to modify the document or the ability to revisit a previous node, SAX might be a better choice.

here's a basic shell of what your handler would look like.

typedef const XMLCh *cxstr_t;

class MyHandler : public DefaultHandler {
public:
void characters(cxstr_t chars, size_t length);
void startElement(cxstr_t uri, cxstr_t localname, cxstr_t qname, const Attribtues & attributes);
void endElement(cxstr_t uri, cxstr_t localname, cxstr_t qname);
void startDocument();
void endDocument();
};

SAX parses in a serial manner using events rather than building a tree.. when it reads an element, it calls startElement.. when it reads the terminal, it calls endElement. and if it reads text inside an element, it calls characters.

you can't magically unmarshall (or in microsoft's world, serialize/deserialize) xml data into an object in c++ (even in java and c# you can't without limitations). you still have to parse the document. the only time you wouldn't need to write a parser is if someone already wrote a class that suits your needs and a method for marshalling/unmarshalling that class.

to unmarshall data in SAX, use a stack inside your class. on startElement, do whatever with the element at the top of the stack.. on endElement, pop the value from the top and store it somewhere else.
in DOM.. it's pretty straight forward. as you traverse the tree, create an object and set its properties.

xerces is an all out complete xml library.. it's not an easy one to learn. there are a lot of simpler libraries out there that don't offer all the same capabilities, but are a hell of a lot easier to learn.