Originally posted by: Kuzi
Originally posted by: IntelUser2000
Hmm. I don't think you realize the differences then. Pentium 4 introduced SSE2, which expanded the width of the register to 128 bits. However the hardware still took 2 cycles to execute the 128-bit SSE2 instruction and it did by splitting up into two and executing 64 bit each. Core 2 introduced single cycle SSE2 execution, which meant that the 128-bit instruction now only took single cycle to execute them.
Although Athlon 64 started using SSE2 since its first implementation, the full 128-bit execution didn't happen until Barcelona/K10.
Thanks for the explanation, kind of forgot all the details, it's been a while since P4 and Athlon 64 where released
My point was that while a certain CPU can support a certain instruction set, the execution time may not necessarily be as fast as other competing CPUs. I think this is what Nemesis was getting at when talking about the AVX implementation in Bulldozer, it might be supported but may not run as fast as on Sandy Bridge.
But I believe the IPC, clock speed, and SMT (if any) of Bulldozer will be much more important than AVX at release, this is what AMD should worry about really.