- Feb 18, 2001
- 7,512
- 2
- 81
Yeah, alright, sorry for the delay but I am teh lazy.
So here is intel's page on SSE4: sse4
SSE4 has about 50 news instructions which is a lot more than the rumors going around were talking about. It is set to appear in menryn which is the 45nm followup to the memrom (core duo) I believe.
Anyway, a brief explanation of sse first to get everyone up to speed. SSE is the successor to MMX. It's also supposed to be used as a replacement for the living dinosaur that is the x87 FPU. The basic idea behind SSE and things like SSE is simple: 1 instruction operates on multiple pieces data rather than just 1 piece of data. (Data must be contiguous in memory). For example, the SSE instruction MULPS can multiply a pair of 4x32bit floating point numbers packed into a single 128bit SSE register. For comparison, fmul multiples one pair of floating point numbers together. So you can see the obvious performance advantage.
So looking at the "new-instrustions-paper.pdf", I see that sse4 consist of mostly sse instructions plus 6 non-sse instructions that are just bundled under the SSE4 moniker for convenience.
Here are the highlights in my opinion:
> 4x32bit integer multiply instruction. Finally. Previously, only crappy 2x32bit integer multiple was available.
> Floating point dot product. Remeber math class? dot(a,b)= a.1*b.1+a.2*b.2+a.3*b.3+a.4*b.4. Useful for physics can other things.
> Register insertion/extraction. Wow. I guess intel is serious about improving data moving between SSE registers and the general purpose registers which currently slow.
> Packed format conversion. Thank god. I will never have to use those crappy shuffle instructions from sse1 again.
> Other boring stuff: packed blending, packed integer min/max, floating point rounding, set and test, compare for equal, unsigned dword to signed dword conversion. Some of these instructions are useful to avoid branching in some situations.
And now for the sse4 instructions unrelated to the sse registers
>String handling instruction. Hmmm. Haven't been any new string handling instructions in the x86 instruction set in a long time. Should be useful. Not sure if this involves the sse registers or not.
> CRC instruction. I'm surprised that they put such a specialized instruction in. I've taken a look at adler crc code in the past and as I recall, it has horrible IPC potential and generates pipeline bubbles in just about every line of code so a specialized instruction should be able to improve the speed of CRC generating a lot.
> count 1's. Counts the number of bits set to 1.
All in all, I'm very pleased with SSE4. It adds some important instructions that fill in the gaps in SSE. For the first time, I feel that sse is essentially complete. Yeah you could add more instructions, but they wouldn't be essential instructions.
So here is intel's page on SSE4: sse4
SSE4 has about 50 news instructions which is a lot more than the rumors going around were talking about. It is set to appear in menryn which is the 45nm followup to the memrom (core duo) I believe.
Anyway, a brief explanation of sse first to get everyone up to speed. SSE is the successor to MMX. It's also supposed to be used as a replacement for the living dinosaur that is the x87 FPU. The basic idea behind SSE and things like SSE is simple: 1 instruction operates on multiple pieces data rather than just 1 piece of data. (Data must be contiguous in memory). For example, the SSE instruction MULPS can multiply a pair of 4x32bit floating point numbers packed into a single 128bit SSE register. For comparison, fmul multiples one pair of floating point numbers together. So you can see the obvious performance advantage.
So looking at the "new-instrustions-paper.pdf", I see that sse4 consist of mostly sse instructions plus 6 non-sse instructions that are just bundled under the SSE4 moniker for convenience.
Here are the highlights in my opinion:
> 4x32bit integer multiply instruction. Finally. Previously, only crappy 2x32bit integer multiple was available.
> Floating point dot product. Remeber math class? dot(a,b)= a.1*b.1+a.2*b.2+a.3*b.3+a.4*b.4. Useful for physics can other things.
> Register insertion/extraction. Wow. I guess intel is serious about improving data moving between SSE registers and the general purpose registers which currently slow.
> Packed format conversion. Thank god. I will never have to use those crappy shuffle instructions from sse1 again.
> Other boring stuff: packed blending, packed integer min/max, floating point rounding, set and test, compare for equal, unsigned dword to signed dword conversion. Some of these instructions are useful to avoid branching in some situations.
And now for the sse4 instructions unrelated to the sse registers
>String handling instruction. Hmmm. Haven't been any new string handling instructions in the x86 instruction set in a long time. Should be useful. Not sure if this involves the sse registers or not.
> CRC instruction. I'm surprised that they put such a specialized instruction in. I've taken a look at adler crc code in the past and as I recall, it has horrible IPC potential and generates pipeline bubbles in just about every line of code so a specialized instruction should be able to improve the speed of CRC generating a lot.
> count 1's. Counts the number of bits set to 1.
All in all, I'm very pleased with SSE4. It adds some important instructions that fill in the gaps in SSE. For the first time, I feel that sse is essentially complete. Yeah you could add more instructions, but they wouldn't be essential instructions.