Binary data in Python...

Armitage · Mar 30, 2006

Anybody know how to do this??

I'm reading a binary file and getting the byte values out using the array module - works beautifully. Most of the data in these files is interpreted as unsigned char, but a few chunks are unsigned longs. I can get the 4 bytes that go into the unsigned, but I don't know how to squach them together into one unsigned long. In C I'd use a union I guess.

Any ideas?

Crusty · Mar 30, 2006

bit shifting 🙂

Armitage · Mar 30, 2006

Originally posted by: Crusty
bit shifting 🙂

Yea, I'm considering that as we speak - taking it apart bit by bit and then putting it back together again. But again - that's something I know how to do in C but not in Python.

Crusty · Mar 30, 2006

http://docs.python.org/ref/shifting.html

Armitage · Mar 30, 2006

Excellent 😀

a_long = (a_byte[0] << 24) | (a_byte[1] << 16) | (a_byte[2] << 8) | a_byte[3]

notfred · Mar 30, 2006

Originally posted by: Armitage
Excellent 😀

a_long = (a_byte[0] << 24) | (a_byte[1] << 16) | (a_byte[2] << 8) | a_byte[3]

Which is exactly how you would have done it in C 🙂. And don't try to run it on a mac.

Armitage · Mar 30, 2006

Originally posted by: notfred

Originally posted by: Armitage
Excellent 😀

a_long = (a_byte[0] << 24) | (a_byte[1] << 16) | (a_byte[2] << 8) | a_byte[3]

Click to expand...

Which is exactly how you would have done it in C 🙂. And don't try to run it on a mac.

Which makes perfect sense given the rest of python's syntax- I've just never gotten down into the weeds like this in Python before.

I probably should put in a byte-order test of some kind in case somebody ever tries to run it on another arch.

kamper · Mar 30, 2006

Originally posted by: notfred
And don't try to run it on a mac.

This may be a dumb question, but why don't they abstract that in the interpreter? Seems kind of silly to have such a simple way to shoot yourself in the foot.

Armitage · Mar 30, 2006

Originally posted by: kamper

Originally posted by: notfred
And don't try to run it on a mac.

Click to expand...

This may be a dumb question, but why don't they abstract that in the interpreter? Seems kind of silly to have such a simple way to shoot yourself in the foot.

Java does that, don't they? Always uses the same byte order internally regardless of architecture.

kamper · Mar 30, 2006

Originally posted by: Armitage

Originally posted by: kamper

Originally posted by: notfred
And don't try to run it on a mac.

Click to expand...

This may be a dumb question, but why don't they abstract that in the interpreter? Seems kind of silly to have such a simple way to shoot yourself in the foot.

Click to expand...

Java does that, don't they? Always uses the same byte order internally regardless of architecture.

Believe so, so why not python? Well, I doubt it uses the same byte order. It'd be much simpler to rewrite the operation at runtime, depending on the platform. I've never really thought about this before though. Does c suffer from the same problem? Obviously it would have to be fixed at compile time there.

Armitage · Mar 30, 2006

Originally posted by: kamper

Originally posted by: Armitage

Originally posted by: kamper

Originally posted by: notfred
And don't try to run it on a mac.

Click to expand...

This may be a dumb question, but why don't they abstract that in the interpreter? Seems kind of silly to have such a simple way to shoot yourself in the foot.

Click to expand...

Java does that, don't they? Always uses the same byte order internally regardless of architecture.

Click to expand...

Believe so, so why not python? Well, I doubt it uses the same byte order. It'd be much simpler to rewrite the operation at runtime, depending on the platform. I've never really thought about this before though. Does c suffer from the same problem? Obviously it would have to be fixed at compile time there.

Java is the only language I know of that does this. Its a performance hit if your language byte order doesn't match your architecture byte order, but in my experience, Java isn't to concerned about performance :evil:

kamper · Mar 30, 2006

Originally posted by: Armitage

Originally posted by: kamper
Well, I doubt it uses the same byte order. It'd be much simpler to rewrite the operation at runtime, depending on the platform. I've never really thought about this before though. Does c suffer from the same problem? Obviously it would have to be fixed at compile time there.

Click to expand...

Java is the only language I know of that does this. Its a performance hit if your language byte order doesn't match your architecture byte order, but in my experience, Java isn't to concerned about performance :evil:

Well of course it is, but that's a whole other can of worms 😛

But like I said, you don't need to use a different byte order than the platform you're running on. All you have to do is make the language guarantee that "<<" works like so on little endian machines and then change the interpreter (or the jit compiler, in the case of a virtual machine) on a big endian system to change the "<<" to ">>" before executing the code (or compiling it, in the case of a virtual machine...).

So far as I understand it, a c compiler could do the same thing, except at compile time. Just guarantee that "<<" will always work the same way and then switch the instruction used under the covers, depending on your target platform.

Armitage · Mar 30, 2006

Originally posted by: kamper

Originally posted by: Armitage

Originally posted by: kamper
Well, I doubt it uses the same byte order. It'd be much simpler to rewrite the operation at runtime, depending on the platform. I've never really thought about this before though. Does c suffer from the same problem? Obviously it would have to be fixed at compile time there.

Click to expand...

Java is the only language I know of that does this. Its a performance hit if your language byte order doesn't match your architecture byte order, but in my experience, Java isn't to concerned about performance :evil:

Click to expand...

Well of course it is, but that's a whole other can of worms 😛

But like I said, you don't need to use a different byte order than the platform you're running on. All you have to do is make the language guarantee that "<<" works like so on little endian machines and then change the interpreter (or the jit compiler, in the case of a virtual machine) on a big endian system to change the "<<" to ">>" before executing the code (or compiling it, in the case of a virtual machine...).

So far as I understand it, a c compiler could do the same thing, except at compile time. Just guarantee that "<<" will always work the same way and then switch the instruction used under the covers, depending on your target platform.

It's not just the bitshift operators though. Reading & writing binary files and other stuff along those lines has the same sort of problem.

kamper · Mar 31, 2006

It should still handle it under the covers shouldn't it? Alright, at that point, yeah, I guess I'd fully expect c to allow you to shoot yourself in the foot. I wrote a Huffman code compressor in java a few years back. I'll see if I can dig it up and move compressed files between my mac and pc. If I can find it, rewriting it in python would be a fun way to learn a bit 🙂

Binary data in Python...

Armitage

Banned

Crusty

Lifer

Armitage

Banned

Crusty

Lifer

Armitage

Banned

notfred

Lifer

Armitage

Banned

kamper

Diamond Member

Armitage

Banned

kamper

Diamond Member

Armitage

Banned

kamper

Diamond Member

Armitage

Banned

kamper

Diamond Member

TRENDING THREADS