Is there no byte order reverse opcode in x86?

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
With all the instruction sets that have been added in recent years (MMX, 3d now!, SSE, SSE2, etc), this seems like a REALLY big hole to me.

It's not only useful for the obvious nothl/htonl function used after every time you resolve a dns name to an IP address. It's also incredibly useful for writing a high speed strcmp, as if the character order is changed to match the native endian byte order you can compare 4 chars at a time (or 8 on AMD64). This occurs a lot more than you might think - consider how lists of things get sorted for instance, or how a DB index works.

I use this technique in this sorting program I wrote, caching the first 8 bytes (in reverse order) of each line being sorted as an __int64, so then when figuring out where each line goes into the priority_queue, a 64-bit integer compare can be done rather than calling strcmp, and a normal string comparison only needs to be resorted to if the first 8 characters of the strings being compared are an exact match.

Better yet, when AMD added the 64-bit extensions, they should have made the new modes mixed-endian, like a PowerPC. Those chips prefer big endian by default, but also natively support little endian because they had emulators in mind when designing it. It would be nice to have a Hammer be mixed endian so the mac emulation speed would still be faster than a comparably priced mac, even after the price drops that came with the G5 introduction.
 

een

Member
Aug 12, 2003
128
0
0
Ugh... I only understand the problem domain, but don't know the answer, have you tried posting in it the software forum? BTW, your program sounds pretty good, but why can't you get the next 8 bytes and compare them when the first eight bytes are identical because it would be faster than strcmp anyway

 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
If I spend too much cpu cycles and RAM keeping cached reversed copies of the entire strings instead of just the beginning of them, the time spent doing this reversal defeats the advantage of the faster comparison. If ntohl was a CPU opcode on x86 this would not be the case.