• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

CPU Bit shifting performance penalty

simonsky

Junior Member
What is the performance penalty for shifting bits in registers of modern X86 CPUs (Pentium 3/4, Athlon).

Is there a big penalty in using bit shifting to store 2 values in a register ?
Compared to the direct addressing method used by the 16 bit registers (AL, AH etc...) ?
Compared to acessing half the values from cache/memory ? (the other half using the register)

Also, does anybody knows why Intel did not allow direct addressing of the two 16bits half of 32 bits registers (EAX becoming EAL/EAH). Would they have needed a lot of logic/transistors/real estate to do it? (was it just unrealistic on the 386 ?)

Could the new 64bits general purpose registers of an AMD Hammer be used to store twice the amount of 32 bits values using bit-shifting? Would there be a performance gain in modifying a current 32bit program to do this? (Assuming an OS that would use native 64bit mode of the hammer so the whole registers are available).

I know it would be a pain to bit-shift every operation on the number stored in the upper half of the register and make sure the lower number does not overflow in the upper one, but would there still be a potential performance gain? Or would the extra logic involved negate the benefits?

Any hints appreciated,

Simon
 
hehehe, you must pardon my inappropriate response, but i have a tendancy to read things and substitute dirty words where i see fit. (it's not tourettes or anything of hte sort). Here's what i read:

What is the performance penalty for shitting bits in registers of modern X86 CPUs (Pentium 3/4, Athlon).



As for your actual post, i dont have a damn clue. Again, please forgive me.
 
I don't know the answer, but IIRC, MIPS also requires that you shift to get the upper 16 bits (I'm not SURE, but that is what I recall)... so if both architectures have the same feature (or lack thereof) there must be some reason.
 
fixed shifts are essentially free. variable shifts have a performance penalty roughly logarithmic with the maximum shift amount.
 
Originally posted by: SuperTool
fixed shifts are essentially free. variable shifts have a performance penalty roughly logarithmic with the maximum shift amount.

Is it not done all in one cycle? You could do [up to] a 16 bit shift in what, 4 or 8 T-gate delays.
 
Back
Top