Question Greatest x86 innovations?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tuna-Fish

Golden Member
Mar 4, 2011
1,646
2,464
136
To expand a little more, for FP in the beginning there was x87, which was bad because it used a pre-defined co-processor instruction encoding that had very limited bits available, so it was designed to use a stack register model (which was in vogue at the time, see sparc and AM29000). Intel however messed it up because a stack register model requires automatic spill and restore to work in practice, and they never did that, so in reality when people used x87 they minimized using registers for anything and usually spilled everything to the C stack on every function call boundary if not more often, which made it much slower than it could be. For 8088 this didn't really matter that much because all FP operations were dog slow anyway, but the problem was that as the processors got better, x87 remained a dog.

Then in the late 90's Intel realized that they'd really like to have integer SIMD for image decompression and audio decoding/etc, and specifically wanted to do 4x16 bit and 8x8bit operations fast. (This is relevant for jpeg and mp3.) For this, they needed registers that fit at least 64 bits, and the x87 registers were right there. But at this point, decoding restrictions were relaxed, and so MMX could use normal register operands. The problem with this is that x87 also uses those same registers for its stack. MMX was a fundamentally a pretty good integer SIMD extension, except if you needed to do any FP work at all, you'd always have to spill your MMX state, stop using MMX, do the x87 work, and then restore. (Well you didn't have to, but it was easy to mess up if you didn't.) So, you guessed it, in practice compilers always spilled all the registers on function call boundaries (if not more often), just to be sure, and we are back to a world where you better inline freaking everything if you want your code to be fast. (And also you get 16kB of icache, have fun.)

3DNow! was AMD basically extending MMX for 32-bit FP. This worked decently, and ironically the places where it helped the most were maybe things that just used MMX but needed to occasionally do some 32-bit FP, because you could mix 3DNow! with MMX at your leisure. It had no 64-bit support, which greatly limited applicability, especially browsers really could have used it (because they both dealt with things that used MMX a lot, and needed 64-bit float support because JS).

SSE1 was Intel finally realizing that the original sin was aliasing registers that are addressed in different ways, and creating an entirely new set of 8 128-bit regs, and doing a fundamentally pretty good FP SIMD extension for them. The only major flaw, lack of integer operands, was fixed SSE2. After the SSE2 was available, there has never been a good reason to emit any x87, MMX or 3DNow! ops. It just basically entirely supercedes them. Yes, there are no sin/cos in SSE, but that's strictly because the sin/cos of x87 were terrible and you can do better (in both accuracy and speed) using a library.
 

NTMBK

Lifer
Nov 14, 2011
10,411
5,677
136
To expand a little more, for FP in the beginning there was x87, which was bad because it used a pre-defined co-processor instruction encoding that had very limited bits available, so it was designed to use a stack register model (which was in vogue at the time, see sparc and AM29000). Intel however messed it up because a stack register model requires automatic spill and restore to work in practice, and they never did that, so in reality when people used x87 they minimized using registers for anything and usually spilled everything to the C stack on every function call boundary if not more often, which made it much slower than it could be. For 8088 this didn't really matter that much because all FP operations were dog slow anyway, but the problem was that as the processors got better, x87 remained a dog.

Then in the late 90's Intel realized that they'd really like to have integer SIMD for image decompression and audio decoding/etc, and specifically wanted to do 4x16 bit and 8x8bit operations fast. (This is relevant for jpeg and mp3.) For this, they needed registers that fit at least 64 bits, and the x87 registers were right there. But at this point, decoding restrictions were relaxed, and so MMX could use normal register operands. The problem with this is that x87 also uses those same registers for its stack. MMX was a fundamentally a pretty good integer SIMD extension, except if you needed to do any FP work at all, you'd always have to spill your MMX state, stop using MMX, do the x87 work, and then restore. (Well you didn't have to, but it was easy to mess up if you didn't.) So, you guessed it, in practice compilers always spilled all the registers on function call boundaries (if not more often), just to be sure, and we are back to a world where you better inline freaking everything if you want your code to be fast. (And also you get 16kB of icache, have fun.)

3DNow! was AMD basically extending MMX for 32-bit FP. This worked decently, and ironically the places where it helped the most were maybe things that just used MMX but needed to occasionally do some 32-bit FP, because you could mix 3DNow! with MMX at your leisure. It had no 64-bit support, which greatly limited applicability, especially browsers really could have used it (because they both dealt with things that used MMX a lot, and needed 64-bit float support because JS).

SSE1 was Intel finally realizing that the original sin was aliasing registers that are addressed in different ways, and creating an entirely new set of 8 128-bit regs, and doing a fundamentally pretty good FP SIMD extension for them. The only major flaw, lack of integer operands, was fixed SSE2. After the SSE2 was available, there has never been a good reason to emit any x87, MMX or 3DNow! ops. It just basically entirely supercedes them. Yes, there are no sin/cos in SSE, but that's strictly because the sin/cos of x87 were terrible and you can do better (in both accuracy and speed) using a library.
Don't forget the fun detail that x87 had 80-bit precision internally, but this would be rounded to 64-bit whenever it spilled to the C stack... and you had no control over when the compiler would choose to do this, so the precision of the exact same C code would vary wildly between different compilers. (Or even the same compiler, if you made a seemingly unrelated change and triggered a spill.)

Fun times!
 

FelixDeCat

Lifer
Aug 4, 2000
30,797
2,621
126
To expand a little more, for FP in the beginning there was x87, which was bad because it used a pre-defined co-processor instruction encoding that had very limited bits available, so it was designed to use a stack register model (which was in vogue at the time, see sparc and AM29000). Intel however messed it up because a stack register model requires automatic spill and restore to work in practice, and they never did that, so in reality when people used x87 they minimized using registers for anything and usually spilled everything to the C stack on every function call boundary if not more often, which made it much slower than it could be. For 8088 this didn't really matter that much because all FP operations were dog slow anyway, but the problem was that as the processors got better, x87 remained a dog.

Then in the late 90's Intel realized that they'd really like to have integer SIMD for image decompression and audio decoding/etc, and specifically wanted to do 4x16 bit and 8x8bit operations fast. (This is relevant for jpeg and mp3.) For this, they needed registers that fit at least 64 bits, and the x87 registers were right there. But at this point, decoding restrictions were relaxed, and so MMX could use normal register operands. The problem with this is that x87 also uses those same registers for its stack. MMX was a fundamentally a pretty good integer SIMD extension, except if you needed to do any FP work at all, you'd always have to spill your MMX state, stop using MMX, do the x87 work, and then restore. (Well you didn't have to, but it was easy to mess up if you didn't.) So, you guessed it, in practice compilers always spilled all the registers on function call boundaries (if not more often), just to be sure, and we are back to a world where you better inline freaking everything if you want your code to be fast. (And also you get 16kB of icache, have fun.)

3DNow! was AMD basically extending MMX for 32-bit FP. This worked decently, and ironically the places where it helped the most were maybe things that just used MMX but needed to occasionally do some 32-bit FP, because you could mix 3DNow! with MMX at your leisure. It had no 64-bit support, which greatly limited applicability, especially browsers really could have used it (because they both dealt with things that used MMX a lot, and needed 64-bit float support because JS).

SSE1 was Intel finally realizing that the original sin was aliasing registers that are addressed in different ways, and creating an entirely new set of 8 128-bit regs, and doing a fundamentally pretty good FP SIMD extension for them. The only major flaw, lack of integer operands, was fixed SSE2. After the SSE2 was available, there has never been a good reason to emit any x87, MMX or 3DNow! ops. It just basically entirely supercedes them. Yes, there are no sin/cos in SSE, but that's strictly because the sin/cos of x87 were terrible and you can do better (in both accuracy and speed) using a library.
Thanks for the history lesson it was interesting 👍
 

DrMrLordX

Lifer
Apr 27, 2000
22,700
12,651
136
(x86-64 EMT64)
As an aside, the Intel name for x86-64 was actually EM64T. EMT64 ("empty 64") was originally a jab at Intel for being forced to copy x86-64 (when Intel was clearly trying to push IA64) and for some bugs that existed in early implementations of EM64T.
 
Jul 27, 2020
26,022
17,952
146
AVX was lacking and thusly it only lasted a couple years. AVX2 (haswell) was where things settled. Recently it was declared as the next dividing line for modern OSs x86-64-v3 (AVX2, FMA, MOVEB, bits).
Yes, the innovation list needs to be updated with what Haswell brought to the table.

1710143794748.png
1710143893199.png

Then Cannon Lake for bringing AVX-512.


When we crank on the AVX2 and AVX512, there is no stopping the Cannon Lake chip here. At a score of 4519, it beats a full 18-core Core i9-7980XE processor running in non-AVX mode which scores 4185. That's insane. Truly a big plus in Cannon Lake's favor.
That's two cores in AVX-512 mode beating 18 cores in non-AVX mode!
 
  • Like
Reactions: Tlh97 and Schmide
Jul 27, 2020
26,022
17,952
146
One more x86 innovation: Highest ever 6.2 GHz ST frequency without overclocking

Core i9-14900KS reviewed but Intel ARK not updated yet with entry for the new CPU.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,094
16,014
136