Are 64 bits really faster than 32?

ModestGamer · Sep 1, 2010

If that was really the case then why do they have prefetch units on the cpu ?

Cogman said:
Bus speeds really aren't hurting modern CPUs. Don't believe me? look at any review that tests the difference between higher clocked ram vs lower clocked ram. The difference in just about every program is near 0. Synthetic tests are usually the only place that really sees a difference between slow and fast ram.

That isn't to say a faster memory accesses wouldn't be nice, just that they aren't a huge issue for most applications (Though, faster memory would probably introduce bigger performance increases than the switch from 32->64 bit).

That being said, the GPU has more of a memory requirements than the CPU does (Loading textures, vertices, ect), so a fastest memory WOULD give a noticeable speed increase to the GPU.

I personally think that we will see a large increase of optical interconnects (rendering bus speed concerns pretty silly. Latency still exist but that is what cache smooths out.)

Who knows though, maybe one day a tech like ZRAM, PRAM, or MRAM will take off, causing our on die cache to jump from 10MB to 128MB+. I don't, however, for see a company putting DRAM on a CPU. Latency and bus speeds aren't the biggest concerns when dealing with DRAM. DRAM itself is pretty slow. (Hence the reason DDR->DDR3 try to avoid this by trying to access several DRAM sections at once)

Scotteq · Sep 1, 2010

Schmide said:
Handles in win32 have 32bits as well, why would you be truncating them to 16bits anyways?

BTW in 64bit windows they have 32bits as well, if you do cast/extend them to a quadword the upper 32bits can be ignored.

The question isn't really do they exist, but rather whether those bits mean anything. Handles in Win 64 have 32 significant bits. If you truncate, then there's data loss. But you don't have to believe me: Check what Microsoft say

Cogman · Sep 1, 2010

ModestGamer said:
If that was really the case then why do they have prefetch units on the cpu ?

Easy, Grabbing large chunks of data at once (which is essentially what the prefetch units are doing) is faster than grabbing data on demand. Even if the DRAM was directly on the CPU, this would not change. The fact of the matter is, DRAM is slow compared to other memory technologies Bus speed really isn't the limiting factor, nor is latency. The limiting factor is simply that DRAM works by charging and discharging tiny capacitors, this makes it slow.

However, it is fast enough. The memory mountain works well in this situation as grabbing data from the RAM is generally not a big deal.

Schmide · Sep 1, 2010

Scotteq said:
The question isn't really do they exist, but rather whether those bits mean anything. Handles in Win 64 have 32 significant bits. If you truncate, then there's data loss. But you don't have to believe me: Check what Microsoft say

You replied to an original question that was about dll memory issues. I had answered that by explaining that even though the are mapped through the page tables into the memory space and appear contiguous to the program, they have their own descriptor tables and are limited in their interaction with other dlls and exes.

Although handles are declared as void* they are in fact unique identifiers and should not be treated as a pointer. I dislike much of the documentation that refers to them as pointers. If you compile a program in 64bits, it does allocate a 64bit pointer (void * are 64bits); however at least in the current implementation of windows, you can assume the upper 32bits will be zero. Regardless you shouldn't be casting handles around anyways.

If you were to port a 16bit program to windows 64bit, handles would actually be the least of your worries. As long as you include the proper header files it should declare them and their derivatives correctly.

ModestGamer · Sep 2, 2010

Cogman said:
Easy, Grabbing large chunks of data at once (which is essentially what the prefetch units are doing) is faster than grabbing data on demand. Even if the DRAM was directly on the CPU, this would not change. The fact of the matter is, DRAM is slow compared to other memory technologies Bus speed really isn't the limiting factor, nor is latency. The limiting factor is simply that DRAM works by charging and discharging tiny capacitors, this makes it slow.

However, it is fast enough. The memory mountain works well in this situation as grabbing data from the RAM is generally not a big deal.

Thanx for backing what I just siad. Expect to see momery on die soon.

Voo · Sep 3, 2010

Cogman said:
The limiting factor is simply that DRAM works by charging and discharging tiny capacitors, this makes it slow.

Yep 10% faster memory is nice, but compared to SRAM in caches there's still a order of magnitude difference. And as long as the caching works well, we can mask most of the latency anyways.

Although I don't agree that memory speeds can't be the limiting factor of applications - that's certainly true for most consumer applications, but HPC for example often taxes the memory subsystem much more..

Cogman · Sep 3, 2010

Voo said:
Yep 10% faster memory is nice, but compared to SRAM in caches there's still a order of magnitude difference. And as long as the caching works well, we can mask most of the latency anyways.

Although I don't agree that memory speeds can't be the limiting factor of applications - that's certainly true for most consumer applications, but HPC for example often taxes the memory subsystem much more..

Sorry, I didn't mean to put off the vib that memory is never the limiting factor, Just that it usually isn't. (as you have already pointed out.)

ModestGamer · Sep 3, 2010

Voo said:
Yep 10% faster memory is nice, but compared to SRAM in caches there's still a order of magnitude difference. And as long as the caching works well, we can mask most of the latency anyways.

Although I don't agree that memory speeds can't be the limiting factor of applications - that's certainly true for most consumer applications, but HPC for example often taxes the memory subsystem much more..

Well its faster until the cache saturates and then its back to bottle necking the cpu again.

I well expect nearly Microcontroler style CPUGPU integrated chips in the next 5-7 years. At which point with that level of integration we will really start seeing the performance of cpu's climb dramatically.

As it is now the architecture is a balance of compromises. When its far more integrated performance will increase substantially and with process improvements I think at 16nm we will see integrated ram being disucssed heavily if not implemented. I doubt we are going to see 4gb but I have a feeling we might see 1-2 gb on die or on chip..

Voo · Sep 3, 2010

ModestGamer said:
Well its faster until the cache saturates and then its back to bottle necking the cpu again.

Well you may be able to build faster cache even today, but I think that's just the usual compromise. Faster cache is usually smaller and more expensive, so if you don't need the speed why not just make it slower but cheaper (or just put more on the chip if you can). I'd say the only problem would be if the CPU could outpace even the fastest SRAM..

And why integrate RAM on chip if you could just put more cache on them? Actually I really don't like having to think about where I put stuff into memory to get performance (hi GPGPU, for me that's really no fun), but just let the architecture handle that sensibly - there are just things that HW can handle better than SW.

aphorism · Sep 4, 2010

JFAMD said:
*Generally* speaking 64-bit is about larger memory addressability and not about 64-bit instructions.

then why did you double registers?!?!?!?

a good x86-64 compiler should be able to beat a 32bit app by a decent margin. you'd have to ask a compiler guru for an accurate figure.

oh and instructions are variable length.

ModestGamer said:
If that was really the case then why do they have prefetch units on the cpu ?

did you take any time to think about that?

they add prefetch for obvious reasons: to reduce memory bottleneck.

ModestGamer said:
Thanx for backing what I just siad. Expect to see momery on die soon.

expect to see exacerbated hotspots? no thanks.

ModestGamer · Sep 4, 2010

aphorism said:
then why did you double registers?!?!?!?

a good x86-64 compiler should be able to beat a 32bit app by a decent margin. you'd have to ask a compiler guru for an accurate figure.

oh and instructions are variable length.

did you take any time to think about that?

they add prefetch for obvious reasons: to reduce memory bottleneck.

expect to see exacerbated hotspots? no thanks.

fully integrated microcontrolers destroy CPU's in both watt and per clock execution.

Its going to go that way.

Cogman · Sep 4, 2010

aphorism said:
then why did you double registers?!?!?!?

a good x86-64 compiler should be able to beat a 32bit app by a decent margin. you'd have to ask a compiler guru for an accurate figure.

ehh. I wouldn't say a decent margin. the x86-64 does have additional GP registers that might speed things up (though, most compilers are notoriously bad at handling registers, especially the gcc). Other than that, the actual extension of the registers really doesn't provide a whole lot of benefits.

Don't get me wrong, there are some places that the x64 instruction set adds benefits, but it really isn't as frequent as you might think. The other issue that arises, applications that would benefit from longer general purpose registers generally already use SSE registers for their work.

tweakboy · Sep 5, 2010

Yes, 64 is a dual lane path. Soo imagine adding two more lanes on the freeway. Now you have much more information that can pass through at once then before.... thx :biggrin:

Cogman · Sep 5, 2010

tweakboy said:
Yes, 64 is a dual lane path. Soo imagine adding two more lanes on the freeway. Now you have much more information that can pass through at once then before.... thx :biggrin:

That really isn't how it works.

Think of it more like this.

Instead of the maximum size of your numbers being 100, the maximum size become 1000 (that you can work with). Any other bit-twiddling hacks are going to be slower then just using another register or even using the stack.

FishAk · Sep 5, 2010

Since the OP's trouble is with a specific 32-bit program, and specifically, the problem of that program needing to page files out of RAM, the easiest and cheapest solution is a RAM disk. Adding as much memory as the motherboard can handle, and then addressing everything above 3.5Gb to paging, would make that program much faster. You could page to an SSD, but traditional RAM is much faster.

Upgrading the computer may not benefit this particular program, but for other operations, it still makes sense to move to 64-bit.

Search

Are 64 bits really faster than 32?

ModestGamer

Banned

Scotteq

Diamond Member

Cogman

Lifer

Schmide

Diamond Member

ModestGamer

Banned

Voo

Golden Member

Cogman

Lifer

ModestGamer

Banned

Voo

Golden Member

aphorism

Member

ModestGamer

Banned

Cogman

Lifer

tweakboy

Diamond Member

Cogman

Lifer

FishAk

Senior member

TRENDING THREADS