Why isn't x86-64 load/store?

FastLaneTX

Junior Member
Mar 25, 2002
2
0
0
Okay, maybe AMD wanted to keep their ISA as close to x86 as possible, but the two biggest complaints against the x86 instruction set are (a) not enough registers, and (b) any instruction can access memory (with complex addressing modes).

Obviously AMD fixed the former to a degree, though 16 registers (with xSP reserved) is still far less than the 32-64 rename registers on the average x86 chip, and a tiny fraction of IA64's 128 GPRs. I'd like to have seen 32 GPRs, and elimination of all implicit register opcodes.

The other part is a bigger problem. The ModR/M, SIB, Imm, and Disp bytes make instruction decoding hell, and almost any instruction can have any combination of them. Switching to a load/store architecture would allow the 8 bits formerly in the ModR/M byte to be used for 16 source and 16 destination registers with no need for a REX prefix, and one could completely ditch the these hellish suffix bytes from all but load/store instructions.

Comments from the more microarchitecture-savvy?
 

zemus

Member
Mar 6, 2002
47
0
0
"I'd like to have seen 32 GPRs"

Like you, I would have prefered to see as many as possible. but I suspect AMD's reasoning for there choice was influenced by a few factors, but I think the main reason was;


Most experienced x86er's have lived without them for so long that there will be no mad rush to put the extra registers to use now that they suddenly appear. Sure, I will start using them right away in almost every asm application I can think of, but I/we do not represent the bulk of programmers. And sure, those extra registers are definatly usefull, but they will never result in a hudge increase in performance, just a little. 16 is a fairly good # in trade off for keeping the design realitivly simple still. I rarly have encountered a situation where having more than 4 data registers would make my code significantly faster. more than anything, just easier to write. Don't get me wrong, i think more is better, it's just that we do not need to go over board on it.

I am a little more excited by the 64 bit size than anything, despite a post claiming this is sort of useless, this i find not to be true at all. Sure, I do not need actual #'s that large, but block transfers will now propagate through the CPU registers at twice the speed, does that mean twice the speed over all, no, but it does mean it will happen somewhat faster. Enough that one will notice it if running certain code types.

most of the code i write can benifit from both the extra data size and the extra registers, but just a little, there is no real need to have 100's of registers visable to the programmer in my opinion though.

In the end, #1 rule , "law of diminising returns". far more important than mores law in my opinion
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0


<< Switching to a load/store architecture would allow... >>



But it wouldn't be backwards compatible. And that is the ONLY reason that AMD has proposed the x86-64.

It's perceived to be such a HUGE advantage to be able to run old code that Intel recently was even second-guessing their own switch to IA64.

In the end, breaking away from x86 entirely is probably a Good Thing. I hope Intel can pull it off, but the ground is looking shakier than ever now. x86 is going strong, 32 bits are enough for 95% of people out there, and Intel's own 4-bit address extension mechanism that allows 64 gigabytes of addressable memory are all reasons that make the industry question paying a price premium to swap to an incompatible, unproven ISA when a cheaper backwards-compatible one is available...
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
haven't these all these processors been implicitly load/store since the pentium pro?
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0


<< haven't these all these processors been implicitly load/store since the pentium pro? >>



Yes, but the issue is that the exposed interface available to assembly language programmers and compiler code generators is still the old register-limited x86 instruction set, along with an even more limited stack-based floating point scheme.

Writing good compilers is already difficult enough as it is, but when resources such as registers are severely limited (as they are in x86) it just makes producing good code more difficult.
 

FastLaneTX

Junior Member
Mar 25, 2002
2
0
0
But you already have to recompile everything to run in long mode, so going to an explicit load/store wouldn't change that. Compatability modes would still accept the implicit load/stores of IA-32, for better or worse.

Perhaps keeping compatibility limits the decode improvements possible, but I'm still curious. If I had a clue how, I'd rig GCC to not generate implicit loads/stores and see if the code ran faster or slower on Simics.