anyone familiar with sse2 and fpu on x86?

jhu

Lifer
Oct 10, 1999
11,918
9
81
can the sse2 instructions replace regular fpu instructions for most programs? it's been noted that the p4's fpu has been slightly crippled by making the fxch instruction take one cycle versus zero cycles to complete which is especially detrimental given the stack-based nature of the x86 fpu.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
hmm... seems like on the opteron/athlon 64 the vector instructions can replace regular fpu instructions depending on the operations being performed
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
SSE2 is a SIMD (Single Instruction Multiple Data) instruction set, which means you have to run the same operation simultaneously on all the registers. A good optimizing complier will sneak it in where possible, but you can't always do it (especially if you're using the SSE registers as extra floating-point storage rather than for their SIMD capabilities).
 

PentiumIV

Member
Feb 19, 2001
56
0
0
Hello !

If your program requires only ADD/SUB/MUL/DIV/SQRT, and you do not need a
full 80-bit precision, then the answer is YES! For a more complex operations, like
SIN/COS, you need to emulate the X87 instruction via macrocode.

P.S. Keep in mind that on Pentium4 even scalar SSE/SSE2 operations, like ADD or SUB
can be dispatched every other clock. This is not the case for Pentium M :)
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
fp data is converted to an internal 80-bit format. the fp data you use and receive is still 64-bit. you'd get less precision in the end, but the question is does that change the end results to a large degree? hmm... i guess it would depend whether value is staying in the register or being written to memory.
 

uart

Member
May 26, 2000
174
0
0
fp data is converted to an internal 80-bit format. the fp data you use and receive is still 64-bit. you'd get less precision in the end, but the question is does that change the end results to a large degree? hmm... i guess it would depend whether value is staying in the register or being written to memory.

For many purposes the loss of precision in going from 80 bit to 64 bit is unimportant. There are however a few aplications that need as much precision as possible. These ususally involve number crunching where there is some sort of feeback structure (part of the output feed back to the input), allowing small rounding errors to gradually build up. I should point out that not all algorithms of this type need super high precision to remain stable, it's only small minority.

Most x86 compilers give you an optional 80 bit FP data type (I haven't used it for a long time, I think it's called "extended" but I'm not sure). Obviously it's less efficient to do so, but those extra 16 bits of FPU mantisa can be saved to memory if the compiler is so instructed.

As mentioned earlier, SSE2 is potentially much faster if vectorization is possible but has reduced instruction richness that may make it unsuitiable for certain tasks, particularly some scientific applications. Also, depending on the implementation, SSE2 may still be faster for certain tasks even if vectorization is not possible. (I dont know this for certain but I've seen some BM's that suggest the the opteron/A64 etc might still benifit from SSE2 even in the absence of vectorization).
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
now that you mention it, you're right. there is an option for 80-bit fp data for x86 compilers. now are there any other cpus that use the 80-bit format? i recall alpha and powerpc cpus using 64-bit fpu registers
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: jhu
now that you mention it, you're right. there is an option for 80-bit fp data for x86 compilers. now are there any other cpus that use the 80-bit format? i recall alpha and powerpc cpus using 64-bit fpu registers

It's an Intel and x86-compatible thing.