Athlon XP supports SSE, 3DNOW!, 3DNOW!+, and MMX...can it use all at once?

NFS4

No Lifer
Oct 9, 1999
72,636
46
91
I've got a question for you. The Athon XP supports a wide range of SIMD instructions. How does the processor switch between the different standards? Does it use all at once or does it choose the default SIMD instruction in any given situation or does it pick which ever one is best for a given situation?

Thanks.
 

Locutus4657

Senior member
Oct 9, 2001
209
0
0
There is no real switching, the cpu just executes the instructions that the code calls.



<< I've got a question for you. The Athon XP supports a wide range of SIMD instructions. How does the processor switch between the different standards? Does it use all at once or does it choose the default SIMD instruction in any given situation or does it pick which ever one is best for a given situation?

Thanks.
>>

 

Buddha Bart

Diamond Member
Oct 11, 1999
3,064
0
0
Think of the instruction set as just a list of things the CPU can do.

When you add SSE, you add like 120 or so (or i could be completely off, doest matter) things to that list.

SIMD is just sorta a concept, its up to lots of specific instructions to impliment it.

bart
 

kpb

Senior member
Oct 18, 2001
252
0
0
Not exactly sure how AMD handles it but when Intel first released thier SSE instructions I remember reading that they shared the registers with the FP unit. So you actually had to make a call to the cpu to switch modes before you could switch between fpu operations and SSE. The mode switch caused a couple cycle delay while it saved the register settings and pulled up the register settings for the other one. I would assume that AMD does something simular on thier processors. It's definitely possible to make a processor that can handle all 3 at the same time but sharing registers saves some resources on the die and fp and smid are usually fairly mutually exclusive.
 

Mday

Lifer
Oct 14, 1999
18,647
1
81
i like to think of those instructions as code which is just encoded in hardware. a combination of the processor and the software you run depends on what gets executed. Since any processor can only do one thing at any given time, only one can be used at once. however, if you consider that a processor is at on GHz, it can do 1 billion "things" in one second. but at the end, the processor is still ONLY doing ONE thing at a time. despite the very short amount of time it spends doing that ONE thing.

Um, this is not quite true for some processors, but essentially, only one thing is done at a time.
 

XZeroII

Lifer
Jun 30, 2001
12,572
0
0
Those are all just instructions. You can call any you want in any order. it's like being able to go forward and back. Then add some instrucitons to go left and right. Next add some instructions to jump and crouch. That's all it's doing.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Not exactly sure how AMD handles it but when Intel first released thier SSE instructions I remember reading that they shared the registers with the FP unit.

actually, the mmx unit maps its registers onto the fp stack. so you can't use both without doing some sort of switch. there are dedicated registers for sse. i don't know about the athlons, but the k6-2 and k6-3 cpus also mapped the 3dnow registers on the fp stack. so on those cpus fp, mmx, and 3dnow couldn't be used sequentially without switching cpu modes
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81


<< Not exactly sure how AMD handles it but when Intel first released thier SSE instructions I remember reading that they shared the registers with the FP unit.

actually, the mmx unit maps its registers onto the fp stack. so you can't use both without doing some sort of switch. there are dedicated registers for sse. i don't know about the athlons, but the k6-2 and k6-3 cpus also mapped the 3dnow registers on the fp stack. so on those cpus fp, mmx, and 3dnow couldn't be used sequentially without switching cpu modes
>>



which would imply to me that all cpus require you to switch modes, otherwise you would have compatibility problems, even if the internal handling is different.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
the switch between fp, mmx, and 3dnow! is implicit, but you still have to explicitly flush the registers before using fp instructions so a stack overflow doesn't occur.
 

Neurofreeze

Member
May 12, 2001
91
0
0


<< i like to think of those instructions as code which is just encoded in hardware. a combination of the processor and the software you run depends on what gets executed. Since any processor can only do one thing at any given time, only one can be used at once. however, if you consider that a processor is at on GHz, it can do 1 billion "things" in one second. but at the end, the processor is still ONLY doing ONE thing at a time. despite the very short amount of time it spends doing that ONE thing.

Um, this is not quite true for some processors, but essentially, only one thing is done at a time.
>>



Not exactly, modern superscalar processors can do more than one thing at a time. Athlon, P4, and G4e all have multiple execution units that work in parallel. An Athlon is a three issue-wide processor, meaning it can work with three macro-Ops at the same time.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
Yes but those operations are all pertaining to 1 instruction. In other words, you may be doing 3 operations at once, but it's all part of the same 1 instruction that is being processed. There can be multiple data types that are processed using one instruction via SIMD, but the rest of the CPU is SISD.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86


<< Yes but those operations are all pertaining to 1 instruction. In other words, you may be doing 3 operations at once, but it's all part of the same 1 instruction that is being processed. There can be multiple data types that are processed using one instruction via SIMD, but the rest of the CPU is SISD. >>



So... what's a pipeline?
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
A pipeline doesn't mean multiple things can be finished in 1 clock, rather, it's a method of dividing up the whole job, like an assembly line. You're still only taking in and finishing 1 instruction every clock (under perfect conditions) but you have multiple stages in between that are instructions which are partially complete. Different instructions go through different parts of each stage but they all go through the pipeline.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81


<< So from your coding example, an Athlon XP would see that line of code and just immediately go SSE and disregard 3DNOW? And if you're using a Thunderbird, it would go straight to 3DNOW?

And if a program that supports SSE and 3DNOW in the example you gave (and your using an Athlon XP processor), the processor would NEVER use 3DNOW?
>>



Yes. That is correct. The key to understanding this is that it's not the processor's decision to make. The program makes the decision.

And by the way imgod2u,


<< In other words, you may be doing 3 operations at once, but it's all part of the same 1 instruction that is being processed >>


This is absolutely wrong. There can absolutely be more than one separate, distinct instruction being executed at the same time. That's the whole point of superscaler. What you're thinking of is pipelining.

Quote Intel Architecture Optimizations Manual (PII & PIII)
"Each cycle, the core may dispatch zero or 1 micro-op on a port to any of the five pipelines (shown in figure 1-2) for a maximum issue bandwidth of five micro-ops per cycle."
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0


<< Yes but those operations are all pertaining to 1 instruction. In other words, you may be doing 3 operations at once, but it's all part of the same 1 instruction that is being processed >>

Um, wherever did you get that idea? The goal of superscalar is to fetch, issue, execute, and retire multiple independent instructions at same time. The P4 and Athlon's micro-op decoding does not change this. Most often when an x86 instruction is decoded into multiple micro-ops, it is in the form of an atomic arithmetic and load/store op...the average is 1.5 uops / x86 instruction. Rest assured that the peak issue rate of 6 to 9 uops/cycle (for the P4 or Athlon) can contain uops from any of the reservation stations (obeying issuing rules) and from multiple independent instructions. And as the average peak issue/retire rate is around 1.5 - 2 uops/cycle for OOOE superscalar x86, the 3-way fetch designs are quite capable of averaging an IPC of up to 1.2 - 1.3 x86 instructions/cycle.



<< There can be multiple data types that are processed using one instruction via SIMD, but the rest of the CPU is SISD. >>

SISD and SIMD are philosophies of data parallelism, superscalar is an independent microarchitectural technique to exploit instruction-level parallelism. Obviously superscalar designs have applied to both SISD and SIMD computers, though MIMD in practice describes multiple independent processors.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
So, let me get this straight. The athlon has, what, three integer units? I forget. Anyways, if that's true, then does it mean that the athlon is capable of executing a maximum of three instructions with integer data types per clock cycle? (excluding load/store/etc)
And with seperate integer and floating point units, does that mean it can execute both instructions with integer and instructions with floating point data at once?
Or am I getting confused with hyperthreading on the p4?
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
The Athlon, like any OOOE superscalar architecture, feeds instructions into reservation stations, out of which the instructions are scheduled and issued out-of-order to its functional units. The Athlon in particular has an 18-entry integer scheduler and a 36-entry FP scheduler. The integer scheduler can issue instructions to each of 3 integer execution units and 3 address generation units, which can then feed data and addresses to an in-order load/store queue at the rate of two load/stores per cycle. The FP scheduler can issue instructions to each of three FP units: an FP-add, FP-multiply, and FP-store...obviously the first two units handle more than adds and multiplies, but that's just a general name to apply to the most common operations.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
here's a good link on the subject (which pretty much invalidates anything i posted previously in this thread)