Intel extends AVX to 512-bit

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BenchPress

Senior member
Nov 8, 2011
392
0
0
I highly doubt that a killer app will appear. If such a killer app existed, don't you think nVidia wouldn't have shown it given for how long they've been claiming GPGPU was the next big thing?
They tried but failed, due to the inherent heterogeneous overhead and programming complications. With the Kepler architecture they're focusing on graphics again, which is where the money is for them, and they've taken a serious step back from consumer GPGPU. AVX-512 instead is homogeneous, which opens up a whole new world of possibilities.

I could list a bunch of things, but that would just be the tip of the iceberg and not do it justice. Don't look for one specific killer app. There are three different kinds of parallelism for increasing performance beyond clock speed scaling: ILP, TLP, and DLP. ILP is pretty much maxed out, TLP you get from multiple cores, and DLP is most efficiently extracted using vector instructions. So although the seed had already been planted with AVX, AVX-512 will really add another dimension to CPU performance as a whole, and not target one specific killer app.

Think about how superscalar execution and multi-core have transformed computing. There is no single killer app for them, but you sure don't want to go back to single-issue or single-core. The same thing will happen with wide vectors. A lot of applications will benefit. Some more, some less, but you'll soon be wondering how we ever lived without them.
Add to that Intel will segment as usual and I bet this won't be used for many years for consumer apps.

So definitely nice, but certainly not a revolution as you claimed.
Of course it will take many years. The same was true about multi-core in 2005, and there are still software companies that only recently started looking at it. TSX should help, but won't be ubiquitous for many years either (especially since indeed Intel segmented support for it). But we still think of multi-core and transactional memory as revolutionary.

So you shouldn't look at the slowness of the market to determine whether something is revolutionary or not. AVX-512 can execute loops up to 16 times faster than legacy 32-bit code. That's not going to leave things unchanged. That's a revolution.
 

SlickR12345

Senior member
Jan 9, 2010
542
44
91
www.clubvalenciacf.com
Big deal. We need to see a standardisation of 6 core processors for the desktop.

I mean unless I see 6 core processors at 3.3GHz with 8mb c3 at $200, I'd consider the next generation a fail.

Intel own projections from 2010 showed that we should have had 8 core processors right now.

Instead we are stuck with 4 cores, unless you are hardcore and have lots of money to be able to spare $1000 for a 6 core processor.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Skylake: AVX3.2, DDR4, PCIe 4.0... can we have mainstream 6C/12T CPUs too? :p
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Big deal. We need to see a standardisation of 6 core processors for the desktop.

I mean unless I see 6 core processors at 3.3GHz with 8mb c3 at $200, I'd consider the next generation a fail.

Intel own projections from 2010 showed that we should have had 8 core processors right now.
We have 10-core processors right now. You have the (lack of) competition to blame for keeping the prices high on anything beyond quad-core. AMD's Steamroller architecture with four modules might finally perform a little closer to an 8-core. So that would make Intel release affordable 6 or 8-core models.

That said, AMD hasn't even put AVX2 support on the roadmap yet, and each module only has one shared SIMD cluster. If Skylake features AVX-512, which the announcement strongly hints at, then AMD will again be severely lacking in performance.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
That said, AMD hasn't even put AVX2 support on the roadmap yet, and each module only has one shared SIMD cluster. If Skylake features AVX-512, which the announcement strongly hints at, then AMD will again be severely lacking in performance.
I saw one AMD guy on LinkedIn mentioning AVX2 for Excavator. He even wrote it in a way which might include SR too.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
What makes you think that?
I tried it. It's much easier to optimize to a GCN than AVX.
The only problem with GPUs is the separate memory space, but with the new APUs this problem will be gone.
Try an Xbox One. I can't talk about what I'm working now, but I want to use bitonic mergesort for sorting. I tried an AVX implementation first, but now I use the iGPU for it. It can sort in-place in system memory, and it was much easier to optimize. Actually I have shocked how easy it was.
 

NTMBK

Lifer
Nov 14, 2011
10,419
5,712
136
I saw one AMD guy on LinkedIn mentioning AVX2 for Excavator. He even wrote it in a way which might include SR too.

I'd hope that Steamroller could get AVX2. If we have to wait for Excavator, then AMD will be almost a year and a half behind Intel in introducing new instruction sets...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I doubt its in SR. Simply because the units are still 128bit.

AMD not only lacks AVX2, they also lack 256bit paths and units.
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,360
136
I doubt its in SR. Simply because the units are still 128bit.

AMD not only lacks AVX2, they also lack 256bit paths and units.
You don't need 256-bit paths or units to handle 256-bit SIMD instructions (of course you'd take a perf hit) so this doesn't prove anything.
 

NTMBK

Lifer
Nov 14, 2011
10,419
5,712
136
I doubt its in SR. Simply because the units are still 128bit.

AMD not only lacks AVX2, they also lack 256bit paths and units.

They used to do SSE on 64-bit vectors, they already do AVX(1) on 128-bit vectors in Jaguar, and they do it on the pair of 128-bit vector units in BD. Instruction cracking and a bit of microcode is nothing new- and Piledriver already supports FMA3, and supported it before Intel did.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
AVX2 is nothing special really, once you have AVX rolling on 256bit packed d/s floats and get so called VEX encoded instructions up, AVX2 instructions that operate on 256b vector integers are no big deal. The main headache will probably come from scatter/gather instructions. Given total AMD incompetence in cache/TLB department in the past, I am scared to think about them trying to get it right so early, microcoded or not, those things need to work correctly and to do so with operands on byte aligment, crossing cache lines, pages etc.

P.S. I am aware that current AVX2 scatter/gather implementation by Intel is not exactly known for speedups either.