• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Athlon 64 Questions

Trying to get a little more information on the Athlon 64 (non-FX). Does it have a vector unit of any type, like the G5 does? Or is the only vector processing done in the MMX/SSE/SSE2/3dNow instruction sets?

Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...
 
Originally posted by: complacent
Trying to get a little more information on the Athlon 64 (non-FX).

All the Athlon64 CPUs have the same capabilities and hardware design. The only differences are that "FX" models have a fully unlocked multiplier, and "Opteron" 2XX and 8XX CPUs have multiple coherent HT links that can be used to enable SMP operation.

Does it have a vector unit of any type, like the G5 does? Or is the only vector processing done in the MMX/SSE/SSE2/3dNow instruction sets?

I assume that the MMX/SSE/SSE2/SSE3/3DNow!-type instructions are done through a hardware vector unit of some type, since they are (mostly) SIMD operations. Normal x86 code (even x86-64) has no facilities for doing this sort of thing (at least an an instruction level).

Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...

Couldn't tell you a lot of details. Probably less than a Netburst-based Pentium, since the pipeline isn't as long. I think they have two integer math units, but maybe I made that up. 😛
 
the G5's vector unit is a simd processor.
Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...
through a superscalar architecture.. the processor determines data dependencies for a set of instructions and orders the micro-ops for parallel execution (out-of-order execution). so.. if 2 common type instructions don't depend on one of the others destination data, then the dispatch unit will send them to different execution units.
Probably less than a Netburst-based Pentium, since the pipeline isn't as long. I think they have two integer math units, but maybe I made that up.
i think you did.. it has 3 integer and 3 floating-point execution units.
 
As I recall, the K7/K8 architecture does not have dedicated vector processors but rather, re-uses its superscalar execution backend (3-way FP, 3-way Integer and 3-way Address Arithmetic) to carry out these vector instructions. Same as Netburst.
 
Yes. But the FP units are not general. There are three different, handling different ops. One reason why there is good room for a boost to FP vector performance in AMD's core design. Not that I'm sure that will happen though. Would be good for marketing though, as it would shift some benchmarks a good bit.

 
As I recall, the K7/K8's FP units are rather flexible. There's 2 general-purpose ones for add and multiply and a separate one for transcendentals and other misc FP operations. Considering SSE/SSE2/SSE3 work on 128-bit vectors (and each general purpose FPU is 64-bit), that's perfect for 1 XMM vector per clock. And since there are 2 issue ports, it does not have the add/mult interleaving limitation like Netburst.
 
Maybe you're relying too much on K7 then.

K8 has three (different) 64-bit wide FP execution pipelines available to the sheduler, which is what counts.
FPMUL, FPADD, FPMISC.
A 128-bit SIMD FP instruction has a throughput of 2 cycles, 2 X 64-bit X 1cycle.
These are all 4 stages deep.

As for actual hardware processing units, connected to the execution pipelines' ports:

FPMUL:
There is one 64-, 80- and 2X32-bit unit for multiplication, divide and square root.
This one handles X87 and SSE2 instructions.

FPADD:
One 64-, 80-bit fp unit for adds and subs.

FPMUL and FPADD:
Then there is one 2X32 FPMUL + 2X32 FPADD for 3DNow and SSE. This can handle both one mul and one add simultaneously, but not two muls or two adds. So 128-bit throughput is still 2 cycles.

FPMISC:
One 64-, 80-bit FP store and misc unit. This handles stores, contains pi, e.., and performs more complex micro coded operations.



Then, as FP-pipes also handle SIMD integer:

FPMUL and FPADD:
Also one 2X64 integer store, add, logic unit for 128-bit SIMD instructions. Again this can serve both FPMUL and FPADD simultaneously. Here, packed 128-bit throughput can be just one cycle, as I believe all instructions can be sheduled into both pipes. But this is integer, not FP.

FPMUL:
One 64-bit integer mul for SIMD instructions. I'm afraid we're back to 2 cycles for 128-bit.

 
Back
Top