Athlon XP supports SSE, 3DNOW!, 3DNOW!+, and MMX...can it use all at once?

NFS4

No Lifer
Oct 9, 1999
72,647
26
91
Well, since HT seems to be dead, I'll ask it here.

I've got a question for you. The Athon XP supports a wide range of SIMD instructions. How does the processor switch between the different standards? Does it use all at once or does it choose the default SIMD instruction in any given situation or does it pick which ever one is best for a given situation?

Thanks.
 

NFS4

No Lifer
Oct 9, 1999
72,647
26
91


<< I think it uses whatever the the programs tells it to use. But I could be wrong. >>


Yeah, but how does the program know which SIMD instruction is right for a given situation. For example, if an application supports 3DNOW!/3DNOW+ and SSE, is there a way for either the processor or the application to determine which SIMD instruction would be the most beneficial?
 

BD231

Lifer
Feb 26, 2001
10,568
138
106
I would think that any program with SSE and 3DNow will default to SSE due to the fact that it's just a better code, but I also think it's possible that it would run both at the same time.
 

Adul

Elite Member
Oct 9, 1999
32,999
44
91
danny.tangtam.com
I think the programs are capable of telling what kind of CPU the system is running and the programers will let it default to what they feel is best to run.
 

MadRat

Lifer
Oct 14, 1999
11,910
238
106
Good question to pose to programmers of EACH and EVERY software title. An intelligently written program would use the most efficient pathway, but do you want it to stop and benchmark each time you run the routine? As hardware changes I am sure it drastically affects the efficiency of the programming. Plus are software writers going to optimize for each and every processor out there? I mean, c'mon, that is a ridiculous assumption. Add on top of all this the need to do mode switches and such in some processors and not others, there really is no good "general" way to guess the most efficient way to skin the cat.
 

NFS4

No Lifer
Oct 9, 1999
72,647
26
91


<< Good question to pose to programmers of EACH and EVERY software title. An intelligently written program would use the most efficient pathway, but do you want it to stop and benchmark each time you run the routine? As hardware changes I am sure it drastically affects the efficiency of the programming. Plus are software writers going to optimize for each and every processor out there? I mean, c'mon, that is a ridiculous assumption. Add on top of all this the need to do mode switches and such in some processors and not others, there really is no good "general" way to guess the most efficient way to skin the cat. >>


I understand what you are saying, but are you just trying to be sarcastic or be funny or what?

I'm just asking how the program/processor goes about determing which SIMD instruction to use. I'm not asking developers to bend over backwards or something;) I'm just trying to understand what process is involved in the software choosing 3DNOW, 3DNOW+, or SSE (given that the Athlon XP supports all three).
 

MadRat

Lifer
Oct 14, 1999
11,910
238
106
No sarcasm intended. Its up to the program team to develop the strategies of using competing technologies. I'm sure they make the wrong decisions from time to time.
 

slackware1995

Member
Apr 4, 2002
109
0
0
Just a guess (which is maybe worse than opinion) and stating before hand that I really have no idea.

Anyways my guess is that it would follow this path:

SSE2
SSE
3Dnow+
3Dnow
MMX

But I am only guessing it to be that way because more Intel CPU's are sold. Therefore programmers (maybe not all) tend to be lazy, so they would check for Intel instructions first.

Please remember, I'm just shooting in the dark. It just makes some sort of demented sense to me. :)


 

imgod2u

Senior member
Sep 16, 2000
993
0
0
3dNow! and SSE and MMX and all the rest are different instruction sets. Meaning that there is no ambiguity between them. If you use a 3dNow! instruction in your code, the processor will run it through the 3dNow! circuitry. If you use an SSE instruction, the chip will run it through the SSE circuitry, simple as that. It's like saying how the processor knows which unit to use, the FPU or ALU. Well if it's a floating point data type it'll use the FPU and vice versa. There is no "switching" in between them, they're all dedicated parts of the logic circuit.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
I think you have some sort of misconception about how stuff like this works. The processor has no ability to "decide" which instruction set to use. The program is ABSOLUTELY explicit about which instruction it will execute at any point in time. If the program supports multiple SIMD standards, that means that it must have physically separate sections of code (eg one for sse & one for 3dnow) that perform the same functions but with different SIMD instructions. When the program executes, it must decide which function to call in some manner like this:

if (SSE)
{
function_using_SSE();
}
else if (_3dnow)
{
function_using__3dnow();
}
else
{
ordinary_instructions();
}

As far as how the program would know which instructions were better, the program would have no absolute knowledge of which instructions were better unless it performed some sort of benchmark but no program does that. However, even though the program has no *absolute* knowledge of which instructions are better, it doesn't really matter because the programmer would know that SSE2 is better than 3dnow+ and code the program so that it would prefer one instruction set over another.

Also, don't think that the instruction sets are mutually exclusive. You see, SSE2 is worthless without the instructions from SSE and neither SSE2/1 does the stuff that MMX does (with exceptions). They don't duplicate each other's functions (with some sort of exceptions). From the programmers point of view (and maybe the processors too) SSE2 & SSE1 are a single instruction set.
 

NFS4

No Lifer
Oct 9, 1999
72,647
26
91


<< I think you have some sort of misconception about how stuff like this works. The processor has no ability to "decide" which instruction set to use. The program is ABSOLUTELY explicit about which instruction it will execute at any point in time. If the program supports multiple SIMD standards, that means that it must have physically separate sections of code (eg one for sse & one for 3dnow) that perform the same functions but with different SIMD instructions. When the program executes, it must decide which function to call in some manner like this:

if (SSE)
{
function_using_SSE();
}
else if (_3dnow)
{
function_using__3dnow();
}
else
{
ordinary_instructions();
}

As far as how the program would know which instructions were better, the program would have no absolute knowledge of which instructions were better unless it performed some sort of benchmark but no program does that. However, even though the program has no *absolute* knowledge of which instructions are better, it doesn't really matter because the programmer would know that SSE2 is better than 3dnow+ and code the program so that it would prefer one instruction set over another.

Also, don't think that the instruction sets are mutually exclusive. You see, SSE2 is worthless without the instructions from SSE and neither SSE2/1 does the stuff that MMX does (with exceptions). They don't duplicate each other's functions (with some sort of exceptions). From the programmers point of view (and maybe the processors too) SSE2 & SSE1 are a single instruction set.
>>



So from your coding example, an Athlon XP would see that line of code and just immediately go SSE and disregard 3DNOW? And if you're using a Thunderbird, it would go straight to 3DNOW?

And if a program that supports SSE and 3DNOW in the example you gave (and your using an Athlon XP processor), the processor would NEVER use 3DNOW?
 

Remnant2

Senior member
Dec 31, 1999
567
0
0
It completely depends on the program, nfs4.

Basically, here is a rough sketch of how the simd units work inside the athlon.

You have 8 general FP stack registers.
You have 8 MMX registers that are mapped onto (the same as) the FP registers. So you can't run MMX and FP code at the same time. To switch between them you need to issue the EMMS instruction. Actually this was one of the big deals in the athlon, the emms instruction was reduced from very expensive to basically free, making it more profitable to use 3dnow more freely.

You have 8 3dNow! registers mapped onto the FP registers as well. You can't execute 3dNow and FP code at the same time, (it's been a while since I wrote 3dnow assembly, but I believe you can do 3dnow and mmx together).

You have 8 SSE registers that are independant of the MMX registers. So you could theoretically interleave SSE and MMX/3dnow/FP code together. Why you would want to is another matter...

The big point is, the Athlon does NOT have seperate execution units for each of these. SSE/3dnow/FP/MMX opcodes share floating point execution units, just as they share load/store units, etc. So although you could write your program to use SSE & FP code at the same time, for example, it wouldn't be any faster -- because you're basically round-robining on the FP execution resources.
From the Athlon design guide (written before the XP),
"The Athlon has 3 floating-point logic pipelines.
The first is the adder pipe, which performs 3dNow! Add, MMX ALU/Shifter, and FP add.
The 2nd is the mul pipe, which performs 3dNow/MMX mul&reciprocal, and FP mul/div/sqrt.
The 3rd is the load/store pipe, which loads and stores data for all SIMD/FP types"



So... which one will be used in any program that supports all 3? Depends. SSE/3dNow do NOT replace MMX, they perform different things and are useful for different purposes. But between SSE and 3dnow, it basically comes down to how the programmers have written their detection routines.. which extension they look for first.