Why is a 64-bit CPU better than a 32-bit CPU (or is it)?

LiQiCE

Golden Member
Oct 9, 1999
1,911
0
0
Posted this in Off Topic too, but I guess I'll get more technical answers here :)

Architecture efficiency aside, why is a 64-bit CPU better than a 32-bit CPU? I understand that a 64-bit CPU is more precise than a 32-bit CPU in terms of floating point calculations. But what other advantages does a 64-bit CPU have over a 32-bit CPU? I've been arguing with somebody who claims that 32-bit CPUs need to split numbers into two registers when they make calculations to get 64-bit precision but I was always under the impression that the CPU would just round off a calculation to 32-bits to get it to fit into a 32-bit register. Additionally he claims that 64-bit CPUs can send data faster because it can queue up 64-bits of data to be sent to memory.

From what I remember from my Computer Architecture class, a 64-bit CPU doesn't give you any critical advantages over a 32-bit CPU but perhaps I'm wrong since both AMD and Intel are moving to 64-bit architectures (and obviously they're doing it for some good reason... although Intel's big thing is getting away from the old x86 architecture).

To give you a little more insight, the arguement is actually over the X-Box's 32-bit Pentium III CPU versus the GameCube's 64-bit (or is it 128-bit?) Gecko Processor ... I don't see any advantage that the GameCube holds over the X-Box simply because the X-Box's CPU is 32-bits while the GameCube's CPU is 64-bits except for precision in floating point calculations. Please don't turn this into an arguement over X-Box versus GameCube, I just want to know some facts regarding why a 64-bit CPU would be better than a 32-bit CPU. Thanks for anyone who can help me out.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< I've been arguing with somebody who claims that 32-bit CPUs need to split numbers into two registers when they make calculations to get 64-bit precision >>

That's sort of true, but the crucial point is that with a 32-bit CPU running typical 32-bit code, 64-bit integer arithmetic is very, very rarely needed. A vast majority of the time, a 32-bit CPU will be running 32-bit integer code, and a 64-bit CPU will be running 64-bit integer code, so there's no arithmetic speed advantage. 64-bit double-precision arithmetic is more common with floating-point code, but x87 has had 80-bit internal precision since its inception, and the Athlon/P4 can do double-precision floating-point adds and multiplies in 4-6 cycles, IIRC. With fully-pipelined floating-point units, and FP code's predictible, serial nature, the throughput can generally remain pretty high. Also, don't forget that SSE can do 128-bit SIMD arithmetic (albeit with 64-bit datapaths).



<< Additionally he claims that 64-bit CPUs can send data faster because it can queue up 64-bits of data to be sent to memory. >>

That's nonsense...PC memory buses are already 64-bit, and besides, the CPU doesn't interact with the memory with its 32-bit or 64-bit word size, it loads and stores entire blocks into the cache at one time (512-bits on the Athlon, 1024-bits on the P4). This is handled by the L2 controller, regardless of whether the CPU is "32-bit" or "64-bit." Then, once the block is in cache, the appropriate word can be fetched by the CPU. If anything, 64-bit CPUs have a distinct disadvantage regarding bandwidth...since a particular word can occupy twice as much space, twice as much data has to be sent.

Honestly, the only distinct advantage at this time, especially for desktop use, for 64-bit CPUs (regardless of the ISA and architecture) is 64-bit memory addressing for flat addressing of more than 4GB.

If all else is equal (ISA, architecture, implementation, code, etc), a 64-bit CPU will typically 5-10% slower than an equivalent 32-bit CPU. This is because the caches will have a higher miss-rate (since up to 1/2 as many words can be stored in cache), and bandwidth requirements are greater.
 

LiQiCE

Golden Member
Oct 9, 1999
1,911
0
0
Wow thanks for the information :) ... I don't know much about x86 architecture so I didn't know it had 80-bit precision internally. The SSE 128-bit registers for SIMD is pretty interesting as well. Most of my Computer Architecture class revolved around the SPARC architecture, so other than very general things about x86 (like there are very few registers as compared to the UltraSPARC chips) I know next to nothing! :)
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81


<< Honestly, the only distinct advantage at this time, especially for desktop use, for 64-bit CPUs (regardless of the ISA and architecture) is 64-bit memory addressing for flat addressing of more than 4GB. >>



I think that is the most important (for a while)... we are at the point where 4GB of ram is not unreasonably expensive (waaaay out of my budget ;)) and more ram is always good.
 

kylef

Golden Member
Jan 25, 2000
1,430
0
0
Intel CPUs from the PII through the P4 can address 16 Gigabytes of RAM. Of course, it does require OS support and I think currently only Win2k Datacenter supports the extra 2 bits of address space. But it is quite good for large database servers.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
actually they can address 64gb. i think those chips have paging extensions
 

mechsiah

Senior member
Aug 8, 2001
346
0
0
Please bare with my ignorance on this-

My understanding was that the 64-bit architecture could grind 64-bits each clock cycle, instead of 32-bits, hence there was a net gain in performance (certainly not 2 for 1, but some). Perhaps that's what your friend meant by providing 64 bits to the processor? The data does't get their any faster, it is just ground through faster.

There is clearly a reason to use a 64-bit architecture for higher end servers (HP/AIX/AS400/etc) that do a lot of information grinding beyond the memory addressing.

Please provide insight...

 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< My understanding was that the 64-bit architecture could grind 64-bits each clock cycle, instead of 32-bits, hence there was a net gain in performance (certainly not 2 for 1, but some). Perhaps that's what your friend meant by providing 64 bits to the processor? The data does't get their any faster, it is just ground through faster. >>

mechsiah, please read my previous post....though the marketing will have you believe otherwise, there is generally no speed advantage. General purpose 32-bit code very, very rarely uses 64-bit integers (I hardly ever see 64-bit integers used in high-level code, the disparity is much greater yet in assembly code). After all, how often are integers larger than 2^32 (~4.24 billion) ever needed, except for flat memory addressing beyond 4GB? (hence 64-bit CPUs usefulness). 64-bit integer math may be useful for scientific applications, but that's a pretty specific purpose. So if you want to compute 1 + 1, a 32-bit CPU running 32-bit code will express the operands using 32-bit integers, and a 64-bit CPU running 64-bit code will express them using 64-bit integers. This is an SISD (single instruction, single data) operation, so each CPU is completing a single operation using operands for their respective ALU width....hence there is generally no speed advantage.

Floating-point code is a different story, where 64-bit double-precision literals are often useful for increased precision, but x86 CPUs do 80-bit internal precision and handle double-precision math very fast.

Like I said earlier, if all else is equal, in general a 64-bit CPU needs up to 4 times the cache size and twice the bandwidth just to keep up with the 32-bit equivalent, due to the higher cache miss-rate and bandwidth requirements. I remember reading studies about this issue somewhere, I'll try to dig up some links....
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
couldn't they just make a 64-bit ip register for greater flat memory address (or why doesn't that work?)?
 

Moohooya

Senior member
Oct 10, 1999
677
0
0
They could just make the ip register 64 bits, but they you'd be very limited in all the jumps and branches you make that use another registers.

jmp [eax]

would only take you to the first 4GB. So you'd have to make at least one other register 64 bits, say eax. However, you'd also want a 64 bit stack point and stack frame pointer, so now eax, ebp, esp and eip are all 64 bits. You might as well go ahead and make the remaining registers 64 bits allowing for the instruction set to be more orthogonal.

Make sense?
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< couldn't they just make a 64-bit ip register for greater flat memory address (or why doesn't that work?)? >>

That's only part of the issue....a move to 64-bit flat memory addressing is going to need significant architectural and ISA changes. You're right, you'll need a 64-bit program counter if you want to jump to any instruction within the 64-bit address range. The exact details will depend on the number and types of addressing modes supported by the ISA, but in general you're going to need 64-bit general purpose registers, 64-bit ALUs, and 64-bit instructions (assuming fixed-length RISC-like instructions).

Assume you're using register-direct addressing, so that the target address is contained in a general purpose register. Let's say some high-level code has the assignment...
a[3] = 5;
....so that the fourth index of array "a" is assigned "5."

The corresponding MIPS code would be somewhat like this (assuming that array a is stored in the static data area, and its address can be determined statically):

lui $t0, high address
ori $t0, $t0, low address <- load the base address of a into register $t0

addi $t0, $t0, 3 <- add 3 to the base address, to get the correct address of the desired index
addi $t1, $0, 5 <- set register $t1 to 5; register $0 is always zero

lw $t1, ($t0) <- load the value "5" into the address of a[3]

So you're going to need 64-bit general purpose registers, in the case of register-direct addressing (and many other modes). You'll need a 64-bit ALU, if you're going to be adding to base addresses to compute some offset.

This last note depends greatly on the ISA, but for classic RISC ISAs, you'll need 64-bit instructions. Those first two instructions, lui, and ori, load the upper and lower part of address into a register. The values high address and low address are going to be constant immediate values in the instructions. The load of the address has to be spread across two instructions...if you have 32-bit instructions and 32-bit addresses, some 32-bit immediate value needs to be loaded into the register. But since some of the 32-bits of the instruction are going to be devoted to the opcode and the register number in which the address is loaded, there will be fewer than 32-bits remaining for the immediate value. Thus, the "lui" instruction loads the upper 16-bits of the address into a register, and the "ori" instruction does a logical OR on the register to load the lower 16-bits of the address. Therefore, if you want to load a 64-bit address into 64-bit registers, the immediate values in the lui and ori instructions will be 32-bits each...with the added bits for the opcode and register addresses, you might as well extend the instructions to 64-bits.
 

Xalista

Member
May 30, 2001
113
0
0
So, all this would make 64bit CPU's pretty much useless for the normal consumers:

1. 64bit is probably a bit slower than 32bit.
2. I don't want to sound like Bill Gates, but I don't think any normal consumer is going to need over 4 Gigs of RAM anytime soon.
3. As Sohcan explained, your going to need 64bit instructions too, which would double code size and thus memory requirements.
 

Remnant2

Senior member
Dec 31, 1999
567
0
0
Basically, you hit it right on the money. :)

This also explains AMD's 64bit scheme. With Sledgehammer, they have a bona-fide 64bit processor for the few occasions such a thing would actually be useful (ie server situations). Plus they have the public "ooh, its 64bits, it MUST be FAST!" sentiment -- which the architectural changes (which enhance both 32 and 64bit modes) will actually back up.
 

heartsurgeon

Diamond Member
Aug 18, 2001
4,260
0
0
a 64 bit word length architecture allows for a larger instruction set, greater precision in calculations, more complex instructions per word, greater single word addressable memory, and greater DMA bandwidth than a 32 bit processor.

if you hobble the system by not designing it to take advantage of these inherent strengths, i'm sure you can come up with a system that underperforms a 32 bit word length architecture, but was that really the point of the question?
 

Xalista

Member
May 30, 2001
113
0
0
a 64 bit word length architecture allows for a larger instruction set, greater precision in calculations, more complex instructions per word, greater single word addressable memory, and greater DMA bandwidth than a 32 bit processor.

if you hobble the system by not designing it to take advantage of these inherent strengths, i'm sure you can come up with a system that underperforms a 32 bit word length architecture, but was that really the point of the question?

The point of this question was to find out what makes a 64bit CPU better than a 32bit CPU. The answer we are giving is that it is mainly interesting from a addressable memory and precision point of view, but NOT from a performance point of view. Another point was that these advantages are not really that important for a normal consumer, whereas speed is.

None of the "inherent strenghts" you mention have any real impact on performance (if anything more complex instructions would make the system slower), so the expectation that 64bit CPU will underperform a 32bit CPU has nothing to do with "hobbling" the system, but more with the observation that the 64bits have, in some ways, a negative effect on performance without opening up the way to any speed enhancing architectural changes.