Benefits from adding General Registers

imported_Ged

Member
Mar 24, 2005
135
0
0
I read a review of AMD's 64bit Implementation vs. Intel's 64bit Implementation at the Tech Report not long ago and it got me thinking. I am always one to think to myself "why didn't they just go all out?"

What made me think that about this article is this:

The x86 ISA only provides eight general-purpose registers, and thus is generally considered register-poor. Most reasonably contemporary ISAs offer more. The PowerPC 604 RISC architecture, to give one example, has 32 general-purpose registers.

and...

the x86-64 ISA brings more and better registers to the table. x86-64 packs 8 more general-purpose registers, for a total of 16, and they are no longer limited to 32-bit values?all 16 can store 64-bit datatypes. In addition to the new GPRs, x86-64 also includes 8 new 128-bit SSE/SSE2 registers, for a total of 16 of those.

AMD introduced only 8 more general purpose registers with x86-64.

Why not more?

Where do the benefits of adding more general purpose registers stop being a good way to increase performance?

When does adding more GPRs become cost prohibitive?
 

SilentZero

Diamond Member
Apr 8, 2003
5,158
0
76
I belive this is because the more registers are added increases the price for consumers dramatically. They probably stick with 16 too keep costs low, and as far as I am aware, adding a multitude of additional registers will not dramatically increase performance to a extremely high level. What it would do however is make prices far to high for most people to afford them.
 

fbrdphreak

Lifer
Apr 17, 2004
17,555
1
0
That sounds very correct. The more registers you implement in your design, the more flip flops you have to physically build, the more transistors you have to add. That and I would imagine the benefits of adding registers starts to outweigh the cost rather quickly.
Good question tho :)
 

AndyHui

Administrator Emeritus<br>Elite Member<br>AT FAQ M
Oct 9, 1999
13,141
16
81
Modern x86 processors implement register renaming... There are a lot more GPRs in hardware than 8, although the software still only sees 8.
 

Calin

Diamond Member
Apr 9, 2001
3,112
0
0
Most of the x86 software was written for the general registers of the Pentium processors. While the newer processors introduce some extra registers, not all software is capable to use them. Also, having more registers with register renaming requests more and more (exponentially I suppose) control logic
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
The problem really boils down to software usage. An average block of code may have more than 8 variables but more than 32? And if more, how much more? If you're working on a large data set, then you'll need literally millions of registers, having 64 over 32 probably wouldn't help in this case because you're most likely not going to be only working on 32 of those values in a continual loop. On the other hand, programs which do work on small data sets at once most likely will not be working on more than 32, or even 16 the majority of times. So it's not really more is better, more of an optimal number and some have claimed it's 16 or 32 or something or other.

I think the whole register thingie is really not that important anymore. Having maybe 16 or 32 is about optimal and more really would not help as it does require software tuning on the assembly level (by either compilers or hand-coding) to use them. More attention should be paid to caching right now as that scales without the need for software to be re-written (although it helps in many cases). I'm still not sure why no one has implemented some form of stack cache considering how fast stack data structures are and that all x86 (as well as most other architecutres) programs use a stack-like data structure when accessing memory (well, stack and heap). Banias/Dothan's dedicated stack manager obviously helped quite a bit, so I don't see why they aren't adding extremely low-latency stack caches on these chips.
 

qaa541

Senior member
Jun 25, 2004
397
0
0
When I took my high performance computing class we simulated running a block of code on a simulated processor that we could change all the features for like cache lines, registers, number of ALUs etc. Processor performance grew well with increasing number of added registers until it hit like 16 ish registers. Then the thing tapered off to grow very slowly. I was simulating 2-d array operations and other common tasks and it seems like 16 is all you need for general purpose computing because of code locality.

However as other people mentioned here, having a bigger and faster cache kept performance growing the longest. However, power needs/dissipation which I simulated grew really high and I'm guessing the engineering/manufacturing costs of having a wafer big enough to fit all of that must make the chips prohibitively expensive. It would be cool to have a 2mb L2 cache like the DEC Alpha on chip though!
 

harrkev

Senior member
May 10, 2004
659
0
71
The register size also determines how many bits are needed for each instruction. If the register set grows too large, all of a sudden you need an extra byte to code the instruction, which has a minor negative impact on performance.

Of course, this is all VERY dependant upon the existing instruction set architecture. This is a MUCH larger problem for 8-bit processors that use 8-bit instructions, thogh.
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Of course, there are a ton more than just 8 or 16 registers on a CPU. According to digital systems class, every input/output bus has its own latches on each bit; although, that may have been a microcontroller.....
 

FrankSchwab

Senior member
Nov 8, 2002
218
0
0
Two issues:
1. harrkev's point about increasing instruction size if you increase the number of programmer-visible registers
2. Registers need to be saved on function calls/interrupts/etc. Add more registers, do more memory writes.

 

imgod2u

Senior member
Sep 16, 2000
993
0
0
Well, considering your general cacheline on modern MPU's are 64 bytes or 128 bytes. I doubt the difference between 16 registers and 32 registers would make much of a difference.