Originally posted by: Matthias99
I don't really want to get into a pissing contest over this. As best as I can find, the P4 has 8 32-bit GPRs, 8 80-bit x87 registers, 8 64-bit MMX registers, and 8 128-bit SSE1/2 registers. I'm not sure whether or not 8 is in the "huge" range 😛. I also thought that the SSE registers couldn't be used directly for non-SSE floating-point operations (something like you couldn't have it be both source and destination for an FP op, but maybe that's only while you're using MMX or SSE1/2). If I'm wrong here, or you have better info, please point me towards a technical document or something, as I'm somehow not having much luck finding anything useful with Google right now.
Sorry, I was refering to the invisible rename registers, which are not directly accessible by programmers. In addition to the registers you've mentioned, the P4 also has 128 32-bit GPRs and 128 128-bit registers for FP/SSE2. The K7/K8 has about 88 of each I believe (except the FP/SSE2 registers are 80-bit only)
See:
http://www.chip-architect.com/news/2003_04_20_Looking_at_Intels_Prescott_part2.html
Opteron extended registers:
http://www.tomshardware.com/cpu/20030422/opteron-06.html#a_big_deal_opterons_64bit_registers
Will it translate to better FPU performance? Double the GPRs, with each able to hold a 64-bit float if necessary -- along with twice as many SSE1/2 registers. I would think that performance with double precision floats in 64-bit mode would be significantly better, although by how much I couldn't tell you without benchmarks and/or more info on the rest of the floating point architecture.
The additional registers should boost performance. However, its dependent on the code, whether a programmer is willing to go to assembler and the maturity of compilers. Even now, Intel's compiler typically produces faster FP code than ones targetted at x86-64 for K8 processors. I've heard estimates of 5-10% improvements in general. Also, I don't think there's any way to get data directly from a FP register to a GPR in any x86 CPU, without writing to memory first.
Considering that SETI is pretty much nothing but FFTs and other forms of analog signal processing, I would think it would have a *lot* to gain from a 64-bit architecture if it significantly improves FPU performance. But unless you have the source code handy, it's really just speculation at this point.
FFTs seem for now, to be clearly superior on the P4. Prime95 is heavily FFT based and extremely well-optimized and the P4 has significantly better performance per clock at the moment.
Will 64-bit desktop computing pan out? I have no idea -- and neither does anyone else, really. If it significantly improves performance in real-world applications and the specialized fields that can use it (and we'll have to wait for 64-bit Windows next year to tell) at a minimal cost, then it will. If it does nothing but cost more, then it probably won't. 😛 We'll have to see what happens when Intel gets Prescott rolling -- a (hopefully) mature, super-fast 32-bit processor versus an unconventional, untested 64-bit newcomer. It ought to be fun to watch. 🙂
Well, I'm sure that 64-bit computing is the future, just that for desktop purposes, it's still not necessary for a several more years, as long as Intel's 32-bit performance matches AMD's 64-bit performance in the same applications. And yeah, I'm anticipating the Prescott, primarily to see how its bigger caches impact Hyperthreading performance.