What is the performance penalty for shifting bits in registers of modern X86 CPUs (Pentium 3/4, Athlon).
Is there a big penalty in using bit shifting to store 2 values in a register ?
Compared to the direct addressing method used by the 16 bit registers (AL, AH etc...) ?
Compared to acessing half the values from cache/memory ? (the other half using the register)
Also, does anybody knows why Intel did not allow direct addressing of the two 16bits half of 32 bits registers (EAX becoming EAL/EAH). Would they have needed a lot of logic/transistors/real estate to do it? (was it just unrealistic on the 386 ?)
Could the new 64bits general purpose registers of an AMD Hammer be used to store twice the amount of 32 bits values using bit-shifting? Would there be a performance gain in modifying a current 32bit program to do this? (Assuming an OS that would use native 64bit mode of the hammer so the whole registers are available).
I know it would be a pain to bit-shift every operation on the number stored in the upper half of the register and make sure the lower number does not overflow in the upper one, but would there still be a potential performance gain? Or would the extra logic involved negate the benefits?
Any hints appreciated,
Simon
Is there a big penalty in using bit shifting to store 2 values in a register ?
Compared to the direct addressing method used by the 16 bit registers (AL, AH etc...) ?
Compared to acessing half the values from cache/memory ? (the other half using the register)
Also, does anybody knows why Intel did not allow direct addressing of the two 16bits half of 32 bits registers (EAX becoming EAL/EAH). Would they have needed a lot of logic/transistors/real estate to do it? (was it just unrealistic on the 386 ?)
Could the new 64bits general purpose registers of an AMD Hammer be used to store twice the amount of 32 bits values using bit-shifting? Would there be a performance gain in modifying a current 32bit program to do this? (Assuming an OS that would use native 64bit mode of the hammer so the whole registers are available).
I know it would be a pain to bit-shift every operation on the number stored in the upper half of the register and make sure the lower number does not overflow in the upper one, but would there still be a potential performance gain? Or would the extra logic involved negate the benefits?
Any hints appreciated,
Simon
