How much space do I save using binary files?

Carlis

Senior member
May 19, 2006
237
0
76
Hi

I'm running some simulations (c++) on a cluster at my university. I write down some vector fields to disc, and during long simulations, I produce a lot of files with these fields which means I'm running out of storage space. Right now, I just use

cout << some data

which results in text files that humans can read. How much space would I save if I stored the data in binary format? The data is just floats and doubles.

Best
Carlis
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
... which results in text files that humans can read. How much space would I save if I stored the data in binary format? The data is just floats and doubles.

It depends. Probably not much -- it depends on the precision flags you are currently using for cout.

Floats and doubles in binary form are four and eight bytes, respectively, assuming IEEE standard types. My infrastructure, by default, prints six significant figures when I cout a double. If you account for a decimal point and a whitespace separator, that's eight bytes of printed space per double -- the same as if you had done the operation in binary mode.

If you're using mostly floats, you might see ~50% less space. Again, it depends on your current precision of cout -- if you've increased cout's precision then you will save more with binary mode I/O.

(binary mode will not lose precision -- ASCII printing will)
 

Carlis

Senior member
May 19, 2006
237
0
76
I did not do anything to the precision, so I suppose I'm looking at 50&#37; in the best case. That is still not a terrific improvement. As for the loss of precision associated with ASCII, it is not really a problem. I use the data to make animations, so that's no issue.

Thank you!
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
I would still move to binary I/O if you're writing lots of data (i.e., many megabytes). It's much, much faster than ASCII if you're doing tons of writes. Furthermore, if you want to subsequently read that data into MATLAB and plot it, it will also be much, much faster (and since you're running on a cluster I assume speed is important).
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
What about compressing the output? Depending on how much repetition of values there is. using a compression library could save much more than 50&#37;.

To see the best case, try archiving some of the output files and look at the compression ratio.

(You could do this either instead of or in addition to switching to binary.)
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Binary writes might not save you a ton of space, but it'll be a lot faster than writing ASCII. On the flip side, you can't open the file and read it visually if that's ever useful.

That said if you want to save space, you could try doing some polynomial approximation of your vector field. You could start by trying to do a least-squares solve using a polynomial basis of some order (probably don't want the order too high--ill conditioned; also use orthogonal polynomials) over the entire domain.

If that's not accurate enough, sub-divide the domain (i.e. mesh it) and perform polynomial fits over each region. This way you only store a handful of coefficients for the polynomial instead of the values at every point.

Caveat: this won't work too well if your function is discontinuous or has (small) regions where some derivative terms are very large. If that's the case you'll have to implement some kind of "regularity indicator" that figures out where these "tough" regions are and either subdivides the elements or increases the interpolation order. This won't be too hard to implement b/c it's really just post-processing, but it will be a fair amt of effort... not sure if you care that much.
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
if space is a real concern, might I suggest using the zlib to compress your files? It is really a fast compression algorithm that has a VERY permissive licence (pretty near public domain) and a fairly easy to use interface. It is also fast, You'll probably even save time storing information as there will be less of it to write.

The data you have should be highly compressible as it only contains a small set of characters.

(I see that DaveSimmons has the same recommendation :D)