JSON versus XML: Dealing with large amounts of data

Schadenfroh

Elite Member
Mar 8, 2003
38,416
4
0
Greetings,

Was fiddling with methods to communicate between a Java client and a PHP server.

Lets say that I am dealing with a very large array of structs containing plain text. Each one of the elements are independent of one another.

Currently, that array of structs is a single large XML file.

On the server, I am using PHP's XML Pull parser to read data as a stream from the client and perform operations on each line of data as it is read, rather than waiting for the entire file to be received and load the entire thing into memory at once. Once it is finished processing the data, I send the results down to the client in a similar XML file. The task takes a little while to run, so I write the resulting elements to the XML stream as they are finished.

On the client side, I am doing something similar with SJSXP (Sun Java Streaming XML Parser). I am reading an element from the array from the stream and performing an operation on it without waiting for the entire file and loading it all into memory. Another problem is that the client does not have nearly as much ram as the server, so it does not make sense to load a 100mb XML file into memory on the client.

I am not sure how many results will be needed at the beginning as I am processing the requests on the fly. I know it is more efficient to send it all at once, but I am doing buffering at both ends even with the writes to the buffer only occurring on one element at a time.

I have not dealt with JSON very much, but it seems to be able to do what I am doing with my custom XML files in a tiny fraction of the amount of code. The only problem is, I am not sure how efficient it is for this kind of task.


So, my question would be, in addition to any other insight that could be provided:

Would it be more efficient to keep my current XML stream parsing or switch to JSON? Seems like I would need to establish a new connection for a transaction each time I wanted to send an element as it is completed, unless I wait for everything to be finished and send it down as a JSON array, which would cause lag for the client. But, there could be something similar to SJSXP that I have not discovered yet that will allow me to keep the everything open as results are returned and treat it as a stream.

Thanks
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
JSON should be smaller than XML.

Could you possibly run it through a compressor? Something like zlib will, for the cost of a few extra cycles, significantly reduce the size of the data you are sending. XML is especially highly compressible.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Is this using HTTP request / responses? If the request header is set to accept zlib compression and it is enabled for PHP you might already be compressing the XML. If it is HTTP but uncompressed you might still be able to enable it on both ends.
 

Schadenfroh

Elite Member
Mar 8, 2003
38,416
4
0
JSON should be smaller than XML.
Yep, that is one thing I like about it, seems to have less fluff (so the transmission would be shorter) and the code to implement (at the high level) can be much smaller, just worried about there being no way to read it as a stream from a single file transmission like with SJSXP. I want the user to be able to use the information as it comes in without having to wait for all of it to be sent down, but at the same time, I want to handle it all in one connection.
Is this using HTTP request / responses?
Yep
Could you possibly run it through a compressor?
I am not familiar with zlib, can I uncompress it as it comes in as a stream so I can process the elements inside before the entire file is sent / received?
for the cost of a few extra cycles
Should be fine for the server, I expect the client to not have much in the way of system resources, but should still be able to handle it.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Yep

I am not familiar with zlib, can I uncompress it as it comes in as a stream so I can process the elements inside before the entire file is sent / received?

Should be fine for the server, I expect the client to not have much in the way of system resources, but should still be able to handle it.

zlib is a very lightweight compression algorithm (as far as resource usage goes). It is the uses same algorithm that .zip archives use (if that gives you a good idea. The algorithm is called DEFLATE).

As for stream support, yes, I believe it does support it, but I haven't played around with it enough to know for sure how it works.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
zlib might wait for the full response from PHP before it sends anything, I don't know either way. You'd want to read up on it (or just ignore my suggestion :) )
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
if its not atomic, go use nsoftware's free AS2 transport engine. non-repudiation FTW.