XML coders - need help doing pretty XML output

Kyteland

Diamond Member
Dec 30, 2002
5,747
1
81
I am rewriting a tool that outputs xml files. Unfortunately due to the nature of the industry I'm in the output needs to be human readable through programs like notepad. This is being written in Visual C++ 6. Does anyone have some tips about outputting "pretty xml"?

More specifically, I want the ability to specify formatting structures. For example I want to be able to output a list in both of these manners:

<DataList> A B C D E </DataList>

- or -

<DataList>
A
B
C
D
E
</DataList>

or the ability to specify that a certain type of element gets written in a specific manner. For example I want to be able to specify that all "DataNode" elements get formatted like this:

<DataNode>
<Name>Blah</Name>
<Value>2</Value> <Index>12</Index> <Age>52</Age>
<Value>43</Value> <Index>93</Index> <Age>36</Age>
<Value>2</Value> <Index>4</Index> <Age>18</Age>
<Value>-1</Value> <Index>50</Index> <Age>32</Age>
</DataNode>

Instead of this:
<DataNode>
<Name>Blah</Name>
<Value>2</Value>
<Index>12</Index>
<Age>52</Age>
<Value>43</Value>
<Index>93</Index>
<Age>36</Age>
<Value>2</Value>
<Index>4</Index>
<Age>18</Age>
<Value>-1</Value>
<Index>50</Index>
<Age>32</Age>
</DataNode>

(With proper tabbing of course.)

The old way we did this was basically a hack job. You can see the limitations of the below code. It came out formatted nicely, but we are starting to run up against its limitations.
Code edited so I don't get fired. ;)

-- snip --
fout<<"\t\t\t\t\t<tag1>\x0A";
fout<<"\t\t\t\t\t\t<tag2>\x0A";
fout<<"\t\t\t\t\t\t\t";
for (i=0;i<count;i++)
{
fout<<var1[k][ i]<<"\t";
}
fout<<"\x0A";
fout<<"\t\t\t\t\t\t</tag2>\x0A\x0A";
fout<<"\t\t\t\t\t\t<tag3>\t"<<k+var2+var3+1<<"\t</tag3>\x0A";
fout<<"\t\t\t\t\t\t<tag4>\tvar4\t</tag4>\x0A";
-- snip --

What I've written so far is something that builds an XML tree. The current operator<< function does one of two things. If the element contains only text it is written like this:
<element>text</element>
And if the element contains more elements it is written like this:
<element>
--more elements--
</element>
This doesn't give me much control over the formatting



I've already got the indentation problem figured out (ie, things are always tabbed properly with respect to their parent elements), but the formatting I'm not so sure about. Really all I need it to be able to specify the whitespace around tags. spaces, tabs and newlines.

So I think I've gotten the problem fully outlined. Any suggestions? :)
 

MrChad

Lifer
Aug 22, 2001
13,507
3
81
Is XML a requirement for the output? Honestly, if you're concerned about how the raw output is formatted, you're better off using a comma- or tab-delimited file and viewing it in Excel than XML. If you must use XML, why not use IE as the default viewer? IE's default XML stylesheet is very readable.
 

Kyteland

Diamond Member
Dec 30, 2002
5,747
1
81
Originally posted by: MrChad
Is XML a requirement for the output? Honestly, if you're concerned about how the raw output is formatted, you're better off using a comma- or tab-delimited file and viewing it in Excel than XML. If you must use XML, why not use IE as the default viewer? IE's default XML stylesheet is very readable.
MrChad,

I work in the gaming industry doing math for slot machines. It is very heavily regulated and everything we do gets submitted to agenices outside of our company for review. I automatically generate files for our system that are (basically, but not quite, so IE/stylesheets wouldn't work) xml. These files must be reviewed but the gaming regulators, so must be human readable as a plain text file. My job isn't to question that, just to make it work right.

So in short, the files aren't quite xml, but are close enough that I can describe them that way. The file format is determined by somebody else.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
I don't think there's much of a call for this sort of thing so you might end up writing alot of your own code. The first thing I'd suggest is separating the formatting code entirely from the code that produces the xml. If you have some sort of intermediary state that both sides can understand (preferably a Document sort of object that comes from an xml library) then everything stays very reuseable. Your formatter then receives this Document and a set of instructions on how to do output and goes nuts (but you probably have to write that yourself).

I'm not sure but you might be able to do something quickly using xsl although I'm not sure how easy it would be to specify precise formatting and to configure it with different output patterns. For that you'd also have to have definite real xml, is that something you have control over?
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
TinyXml can do it, with the stupid limitation that it only pretty prints to FILE*'s. Actually considering the license, you should be able to easily rip out the formatting code and use it however you want.

see TiXmlBase::print here
 

Kyteland

Diamond Member
Dec 30, 2002
5,747
1
81
Originally posted by: BingBongWongFooey
TinyXml can do it, with the stupid limitation that it only pretty prints to FILE*'s. Actually considering the license, you should be able to easily rip out the formatting code and use it however you want.

see TiXmlBase::print here
Thanks, I'll take a look at that today.

I think what I'll end up doind is this. Our not-quite-but-almost-xml format has something similar to a dtd file. I'll use that to define what elements should exist in the outputso that the file is validated as it is being written. I'll have another file that defines the formatting for a given element. If no formatting exists for an element defined in the dtd default formatting is used.

If I come up with anything really useful I'll post it here, but I doubt it will be.