Compressing Arrays On-The-Fly?

Armitage · Aug 6, 2002

I'm working on a PVM program that I'm afraid is going to run into some network bandwidth issues.
In the initialization phase, the program has to generate about 1GB of data via a relatively expensive operation. So I want to split this up over the cluster. But all of the nodes need all of the data.

So, if I have 16 nodes, each node will generate 1000MB/16 = 62.5MB (data generated by each node)
It then has to send that data to the other 15 nodes: 62.5MB * 15 = 937.5MB (network traffic generated by each node)
Each node generates this much traffic, so: 937.5MB * 16 = 15000MB = 15GB (total traffic on the network)

Looking at it another way ... each node has to receive 937.5GB of data. This is would ideally take at least 75 seconds on a 100baseT net (yea I know ... but the myrinet is down and not likely to get fixed soon). In practice, with this huge multicast going on, I think it will be considerably slower.

Ok, so finally to the question ...
Does anybody know of a C library that I could use to compress the data array before packing it into the PVM message? I think it may be worth trading the CPU time for the network bandwidth if I could get maybe 2-to-1 compression. Would take some benchmarking to know for sure.

m0ti · Aug 6, 2002

It depends on what kind of data you've got.

Since your transfering large amounts of data, you should probably take a look at bzip2, or similar methods.

here's a link for bzip2 (change the source so that it works as a library which accepts and outputs arrays).

check out link for compression material (has some libraries too)

there's also zlib of course.

may be useful for you if you've got lots of communnications (or want to transfer your data in blocks): here

My recommendation: start off with zlib, it's use is very widespread, and its already a library (how convenient). See if it gives you the compression you need.

EagleKeeper · Aug 6, 2002

CAn you just broadcast the data once over your network instead of shipping it to each node seperately?

Armitage · Aug 6, 2002

Originally posted by: EagleKeeper
CAn you just broadcast the data once over your network instead of shipping it to each node seperately?

Every node needs all of the data. I'm hoping the multicast works correctly so that the sender only sends the data once (as my calculations show). But every node needs to receive every other nodes data.

Nice links m0ti! Looks like exactly what I need

Thanks

RSMemphis · Aug 6, 2002

Every node needs all of the data. I'm hoping the multicast works correctly so that the sender only sends the data once (as my calculations show). But every node needs to receive every other nodes data.

No, EagleKeeper has a point! Your calculations:

So, if I have 16 nodes, each node will generate 1000MB/16 = 62.5MB (data generated by each node)
It then has to send that data to the other 15 nodes: 62.5MB * 15 = 937.5MB (network traffic generated by each node)
Each node generates this much traffic, so: 937.5MB * 16 = 15000MB = 15GB (total traffic on the network)

show both the 16 times sending and the 15 times receiving or whatever.
If you broadcast (send to xxx.xxx.xxx.255), you will only have 62.5MB * 16 = 1000 MB of data on the network.
With a 100Mbit connection, this will still take more than 80 seconds though - some of your calculations are off badly.

Armitage · Aug 6, 2002

Originally posted by: RSMemphis

Every node needs all of the data. I'm hoping the multicast works correctly so that the sender only sends the data once (as my calculations show). But every node needs to receive every other nodes data.

Click to expand...

No, EagleKeeper has a point! Your calculations:

So, if I have 16 nodes, each node will generate 1000MB/16 = 62.5MB (data generated by each node)
It then has to send that data to the other 15 nodes: 62.5MB * 15 = 937.5MB (network traffic generated by each node)
Each node generates this much traffic, so: 937.5MB * 16 = 15000MB = 15GB (total traffic on the network)

show both the 16 times sending and the 15 times receiving or whatever.

Yes, that's correct. Every node sends its data to every other node.

If you broadcast (send to xxx.xxx.xxx.255), you will only have 62.5MB * 16 = 1000 MB of data on the network.

Maybe I'm not understanding how it works inside the switch.
If node1 multicasts 62.5MB to 15 nodes, then only 62.5MB goes down the wire from node1 to the switch. But somewhere along the line that data needs to get re-sent to each of the 15 additional nodes. Once it gets on the wire to the target node, it's again only 62.5MB, but the switch is somehow sending 15*62.5MB = 937.5 (just for that one sender).
There are 16 senders, so it seems that, internal to the switch at least, there is 15000MB

Maybe that's not the right way to look at it?

With a 100Mbit connection, this will still take more than 80 seconds though - some of your calculations are off badly.

Each node will receive 1000MB - 62.5MB = 937.5MB * 8bits/byte = 7500Mbits

7500Mbits / 100Mbits/second = 75 seconds

Seems pretty fast to me also, to move nearly 1GB over a 100baseT net. What am I missing?

Maybe I should repost this over in the networking forum...

RSMemphis · Aug 6, 2002

Oh, I see. Your original post says something about 15GB of total traffic (which is limited to 100MBit per second).
I see now how you did the math.

Okay, here's the thing. Each line sends 62.5MB to the switch, and the switch will forward it to every computer attached, so every single one of them will get that 1000MB back, since the switch will typically send broadcasts back to the originator, I believe.

I am not sure if a switch can internally cope with 16x the data though. So the switch may slow you down.

Anyway, to get back to the original question, the links provided are very good, bzip2 actually is often a better choice than zlib, but as moti said, zlib is already directly available as a library.

Alternatively, if you know more about the data, it may be better to find your own algorithm...

Armitage · Aug 6, 2002

Originally posted by: RSMemphis
Oh, I see. Your original post says something about 15GB of total traffic (which is limited to 100MBit per second).
I see now how you did the math.

Okay, here's the thing. Each line sends 62.5MB to the switch, and the switch will forward it to every computer attached, so every single one of them will get that 1000MB back, since the switch will typically send broadcasts back to the originator, I believe.

I am not sure if a switch can internally cope with 16x the data though. So the switch may slow you down.

Ok, so my math isn't crazy then?
It will be interesting to see how the network speed scales with the problem size.

Anyway, to get back to the original question, the links provided are very good, bzip2 actually is often a better choice than zlib, but as moti said, zlib is already directly available as a library.

Yea, bzip2 rocks compared to compress (which uses zlib I think). It's available as a library also, though I haven't looked at the API yet.

Alternatively, if you know more about the data, it may be better to find your own algorithm...

Yea, the data is actually already a pretty efficient representation. About as good as we're likely to get working strictly from our understanding of the data. Now I think we just need to mash it down.

There may be some tradeoff of CPU vs. network speed also. It may be better to have each of the 16 nodes calculate more the 1/16th of the data, and syncronize only a subset over the net. I suspect that at current cluster & problem sizes, the CPU time will still dominate network time. But if we get more or faster nodes we'll have to re-evaluate.

CodeJockey · Aug 7, 2002

Regardless of the way that the switch and the broadcast to the other nodes works, compressing the data will be of benefit.

Also, could you begin transmitting the data while you are still generating it, or must it all be generated first? It would save time to have these two functions (three, actually, since you must also be receiving and storing data sent by other nodes) operating in parallel.

Armitage · Aug 7, 2002

Originally posted by: CodeJockey
Regardless of the way that the switch and the broadcast to the other nodes works, compressing the data will be of benefit.

Also, could you begin transmitting the data while you are still generating it, or must it all be generated first? It would save time to have these two functions (three, actually, since you must also be receiving and storing data sent by other nodes) operating in parallel.

Yea, eventually I'd like to do that.
The problem is that this is a PVM retrofit to some existing code that has a remarkably byzantine data structure.
After the data generation is done, I have confidence that I know how to deal with it. But I don't understand the code enough to go mucking around in the middle right now!

m0ti · Aug 10, 2002

glad that I could help.

Search

Compressing Arrays On-The-Fly?

Armitage

Banned

m0ti

Senior member

EagleKeeper

Discussion Club Moderator<br>Elite Member

Armitage

Banned

RSMemphis

Golden Member

Armitage

Banned

RSMemphis

Golden Member

Armitage

Banned

CodeJockey

Member

Armitage

Banned

m0ti

Senior member

TRENDING THREADS