Wow, did something change these past few months? Seems I've been reading a lot of negativity on these forums surrounding AVX-512 lately. I thought AVX-512 was SIMD done right, as there was a lot of fanfare when it first became available.
Now though, there is more criticism than not it seems. And that's not to say that criticism is unwarranted mind you. But my, the times have changed I guess.
If pursuing wider vectors is a mistake, then why does Intel seem hellbent on doing so?
As a counterpoint to Gideon's reply above, as a programmer I think that AVX-512 is the best designed SIMD extension to x86 ever, and the sooner we get it in every new x86 cpu so that I can actually start using it, the better. At lot of those new instructions are in there because Intel finally took it's time and did it right, with a nice, complete and orthogonal set.
However, there are very few situations where the vectors *really* need to be that long, and a lot of situations where having less simd width in the CPU is beneficial. My dream solution, what I'd really like AMD to do, is to keep the current vector ALU widths, but double the registers to 512 and fully implement the AVX-512 instruction set, splitting full-width vectors to two ALUs, like they used to do with AVX on Zen1.
A lot of the flak AVX-512 has gotten recently is because the implementation of throttling the CPU down to execute very wide code that was used by the early Intel CPUs that had it was really
-Redacted-. There is nothing fundamentally wrong about clocking a cpu down a bit to run wider vectors; clocking down even 30% to double the vector width for vector-heavy code is still a substantial win in throughput! The problem was that Intel was
eagerly clocking down when it encountered almost any AVX-512 ops, and took a long time clocking back up. Which means that code that has a few of them sprinkled around lots of scalar code will run all that scalar code at the lower clock, which really sucks.
But anyway, I would caution against just looking at vector units alone and thinking that widening them, or increasing their amount, would be automatically good. Most code that matters is still scalar, and will probably stay that way. Making the vector units wider mean that the proportional amount of time spent in vector code goes down, so every doubling of width gives less real gain that the last. And widening execution units themselves doesn't even help all that much; they need to be kept fed, and the memory/cache interface that is optimized to keep a very wide vector machine happy is quite different from one meant to satisfy a fast scalar machine. So optimization too far for vectors can be pessimization for scalar code.
Profanity in the technical forums is not allowed.
Thanks,
Daveybrat
AT Moderator