Athlon 64 Questions

complacent · Apr 20, 2005

Trying to get a little more information on the Athlon 64 (non-FX). Does it have a vector unit of any type, like the G5 does? Or is the only vector processing done in the MMX/SSE/SSE2/3dNow instruction sets?

Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...

Matthias99 · Apr 20, 2005

Originally posted by: complacent
Trying to get a little more information on the Athlon 64 (non-FX).

All the Athlon64 CPUs have the same capabilities and hardware design. The only differences are that "FX" models have a fully unlocked multiplier, and "Opteron" 2XX and 8XX CPUs have multiple coherent HT links that can be used to enable SMP operation.

Does it have a vector unit of any type, like the G5 does? Or is the only vector processing done in the MMX/SSE/SSE2/3dNow instruction sets?

I assume that the MMX/SSE/SSE2/SSE3/3DNow!-type instructions are done through a hardware vector unit of some type, since they are (mostly) SIMD operations. Normal x86 code (even x86-64) has no facilities for doing this sort of thing (at least an an instruction level).

Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...

Couldn't tell you a lot of details. Probably less than a Netburst-based Pentium, since the pipeline isn't as long. I think they have two integer math units, but maybe I made that up. 😛

itachi · Apr 20, 2005

the G5's vector unit is a simd processor.

Also, what type of instruction level parallelism does it have, if any? Unfortunately, I can't find a lot of information about this...

through a superscalar architecture.. the processor determines data dependencies for a set of instructions and orders the micro-ops for parallel execution (out-of-order execution). so.. if 2 common type instructions don't depend on one of the others destination data, then the dispatch unit will send them to different execution units.

Probably less than a Netburst-based Pentium, since the pipeline isn't as long. I think they have two integer math units, but maybe I made that up.

i think you did.. it has 3 integer and 3 floating-point execution units.

Sahakiel · Apr 20, 2005

www.chip-architect.com www.chip-architect.com

imgod2u · Apr 21, 2005

As I recall, the K7/K8 architecture does not have dedicated vector processors but rather, re-uses its superscalar execution backend (3-way FP, 3-way Integer and 3-way Address Arithmetic) to carry out these vector instructions. Same as Netburst.

Vee · Apr 21, 2005

Yes. But the FP units are not general. There are three different, handling different ops. One reason why there is good room for a boost to FP vector performance in AMD's core design. Not that I'm sure that will happen though. Would be good for marketing though, as it would shift some benchmarks a good bit.

imgod2u · Apr 21, 2005

As I recall, the K7/K8's FP units are rather flexible. There's 2 general-purpose ones for add and multiply and a separate one for transcendentals and other misc FP operations. Considering SSE/SSE2/SSE3 work on 128-bit vectors (and each general purpose FPU is 64-bit), that's perfect for 1 XMM vector per clock. And since there are 2 issue ports, it does not have the add/mult interleaving limitation like Netburst.

Vee · Apr 22, 2005

Maybe you're relying too much on K7 then.

K8 has three (different) 64-bit wide FP execution pipelines available to the sheduler, which is what counts.
FPMUL, FPADD, FPMISC.
A 128-bit SIMD FP instruction has a throughput of 2 cycles, 2 X 64-bit X 1cycle.
These are all 4 stages deep.

As for actual hardware processing units, connected to the execution pipelines' ports:

FPMUL:
There is one 64-, 80- and 2X32-bit unit for multiplication, divide and square root.
This one handles X87 and SSE2 instructions.

FPADD:
One 64-, 80-bit fp unit for adds and subs.

FPMUL and FPADD:
Then there is one 2X32 FPMUL + 2X32 FPADD for 3DNow and SSE. This can handle both one mul and one add simultaneously, but not two muls or two adds. So 128-bit throughput is still 2 cycles.

FPMISC:
One 64-, 80-bit FP store and misc unit. This handles stores, contains pi, e.., and performs more complex micro coded operations.

Then, as FP-pipes also handle SIMD integer:

FPMUL and FPADD:
Also one 2X64 integer store, add, logic unit for 128-bit SIMD instructions. Again this can serve both FPMUL and FPADD simultaneously. Here, packed 128-bit throughput can be just one cycle, as I believe all instructions can be sheduled into both pipes. But this is integer, not FP.

FPMUL:
One 64-bit integer mul for SIMD instructions. I'm afraid we're back to 2 cycles for 128-bit.

Athlon 64 Questions

complacent

Banned

Matthias99

Diamond Member

itachi

Senior member

Sahakiel

Golden Member

imgod2u

Senior member

Vee

Senior member

imgod2u

Senior member

Vee

Senior member

TRENDING THREADS