Discussion Intel current and future Lakes & Rapids thread

Page 453 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
All this AVX-512 debacle, is there some instruction difference between icelake and tremont on Lakefield someone could try? Or on the ARM side with big-little, say even outside windows? If those can do heterogeneous multi-processing already I see no issues, it's not like Alder will go back to cluster switching ( and it probably couldn't given the various big-small core counts) and disabling AVX512 makes no sense when Cannon, Ice, Tiger and Rocket Lake had it.

IIRC Ian was told that AVX-512 was physically removed from the Sunny Cove core in Lakefield. I don't know if that's really true though.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,859
136
All this AVX-512 debacle, is there some instruction difference between icelake and tremont on Lakefield someone could try? Or on the ARM side with big-little, say even outside windows?
The issue is already clear-cut, the only reason we're having this conversation is because some people pretend it wasn't already discussed and explained to the finest detail.

Here's Ian Cutress' take on the situation as it was presented with Lakefield:
One of the biggest issues with a heterogeneous processor design is software. Even if we go beyond the issues that come with scheduling a workload on such a device, the problem is that most programs are designed to work on whatever microarchitecture they were written for. Generic programs are meant to work everywhere, while big publishers will write custom code for specific optimizations, such as if AVX-512 is detected, it will write AVX-512.

The hair-pulling out moment occurs when a processor has two different types of CPU core involved, and there is the potential for each of them to support different instructions or commands. Typically the scheduler makes no guarantee that software will run on any given core, so for example if you had some code written for AVX-512, it would happily run on an AVX-512 enabled core, but cause a critical fault on a core that doesn’t have AVX-512. The core won’t even know it’s an AVX-512 instruction until it comes time to decode it, and just throw an error when that happens. Not only this, but the scheduler has the right to move a thread when it needs to – if it moves a thread in the middle of an instruction stream, that can cause errors too. The processor could also move a thread to prevent thermal hotspots occurring, which will then cause a fault.

There could be a situation where the programmer can flag that their code has specific instructions. In a program with unique instructions, there’s very often a check that tries to detect support, in order to say to itself something like ‘AVX512 will work here!’. However, all modern software assumes a homogeneous processor – that all cores will support all of the same instructions.

It becomes a very chicken and egg problem, to a certain degree.

The only way out of this is that both processors in a hybrid CPU have to support the same instructions completely. This means that we end up with the worst of both worlds – only instructions supported by both can be enabled. This is the lowest common denominator of the two, and means that in Lakefield we lose support for AVX-512 on Sunny Cove, but also things like GFNI, ENCLV, and CLDEMOTE in Tremont (Tremont is actually rather progressive in its instruction support).

Knowing that Lakefield was going to have to take the lowest common denominator from the two core designs, Intel probably should physically removed the very bulky AVX-512 unit from the Sunny Cove core. Looking at the die shot, it's still there - there was some question going into the recent disclosures as to whether it would still be there, but Intel has stated on the record repeatedly that they removed it. The die shot of the compute silicon shows that not to be the case.

So not only did Intel have to disable AVX 521 functionality from Sunny Cove side, but also some instruction support from the Tremont side! The only solution to this problem would be hardware based, and yet Intel is not only tight lipped about such capabilities on ADL, they have also officialy stated AVX 512 is to be disabled in Hybrid mode.
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
Thanks that is helpful, although that is slightly different from the question (disabling something is different than not supporting something).

I imagine it's not physically disabled, just that the chip won't report that it supports AVX-512 when the small cores are enabled.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,859
136
IIRC Ian was told that AVX-512 was physically removed from the Sunny Cove core in Lakefield. I don't know if that's really true though.
Ian was told it was removed, yet it was still there as per his own reporting.
Knowing that Lakefield was going to have to take the lowest common denominator from the two core designs, Intel probably should physically removed the very bulky AVX-512 unit from the Sunny Cove core. Looking at the die shot, it's still there - there was some question going into the recent disclosures as to whether it would still be there, but Intel has stated on the record repeatedly that they removed it. The die shot of the compute silicon shows that not to be the case.
 

dullard

Elite Member
May 21, 2001
25,055
3,408
126
I imagine it's not physically disabled, just that the chip won't report that it supports AVX-512 when the small cores are enabled.
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.

There are three fundamental questions:
1) Is the AVX-512 hardware there?
2) Is the AVX-512 hardware functional?
3) If both #1 and #2 are true, can it be set on the fly or only upon bootup in BIOS?
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.

Presumably so, but Windows barely works as it is. Intel probably figures it's too risky.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.

In that case the scheduler would have to lock the entire workload and all its threads only to the Golden Cove cores out of the possibility that the application could spawn threads executing AVX-512 instructions without the scheduler being able to accurately predict when or where that might happen.

And on what basis would the scheduler make the determination that any given application requires AVX-512?
 

dullard

Elite Member
May 21, 2001
25,055
3,408
126
In that case the scheduler would have to lock the entire workload and all its threads only to the Golden Cove cores out of the possibility that the application could spawn threads executing AVX-512 instructions without the scheduler being able to accurately predict when or where that might happen.

And on what basis would the scheduler make the determination that any given application requires AVX-512?
That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.

I do not know the answer to your question. I can speculate though. For example, the easiest possible solution would be to have a list of known software that doesn't use AVX-512 (such as Windows) that runs only on the smaller cores leaving the bigger cores cold and fully available for the rest of the software. The smaller cores could provide near-instant responsiveness to the end user. The larger cores do the grunt work as needed.
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.

There isn't anything to fix. Intel will eventually come up with a solution that doesn't require OS support but that isn't in Alder Lake. Or AVX-512 will be added to the small cores.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.

Lol, thats not how it works. It is essentially the current core (and not the chip) reporting its capabilities to the application (and not to the OS, mind you?). And based on this report, the application is choosing a code path - and this typically happens a single time when the application starts and not anymore thereafter.

That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.

Just stop posting wild speculations, in particular when it gets crystal clear, that you have no idea, what you are talking about (see above).
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,585
5,209
136
Lol, thats not how it works. It is essentially the current core (and not the chip) reporting its capabilities to the application (and not to the OS, mind you?). And based on this report, the application is choosing a code path - and this typically happens a single time when the application starts and not anymore thereafter.

They could implement something like OP mentioned where an application could be forced to run on the Big Cores only and get AVX-512 support. But they won't, at least officially and definitely in Windows.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
There isn't anything to fix. Intel will eventually come up with a solution that doesn't require OS support but that isn't in Alder Lake. Or AVX-512 will be added to the small cores.

Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
How useful exactly is AVX512?

Depends on what you're using it for. If you're doing HPC-style work that wants wide vectors then it can definitely improve your application performance. That's the sort of workload you would expect people to throw at GPGPU and/or server-style CPUs. Not really the target application group for consumer CPUs like Alder Lake. However, there are some AI-related extensions slapped on to AVX-512, which - to my knowledge - are more related to AI training than anything else . . . but Intel has been particular about including those extensions in CPUs like Tiger Lake. It may be that they have uses in consumer ML applications. You may have noticed a wide variety of other CPU/SoC vendors boasting of their ML capabilities for similar reasons.

Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.

That might be the most elegant solution. But it won't help Alder Lake since Gracemont is pretty much set-in-stone by now. Also, as I mentioned above, AVX-512 is more than just wider vectors.
 
  • Like
Reactions: Tlh97

Hulk

Diamond Member
Oct 9, 1999
4,214
2,007
136
In spite of all of our endless discussion as to which CPU is better than which, you've gotta love the free market because in the end it figures it out.

Microcenter pricing as of 6/7/2021
10700 - $220
10700K - $250
10850K - $300 (this is a really good deal I think)
10900 - $330
11700K - $350
5800X - $370
11900K/10900K - $500
5900X - $550
5950X - $800

I mean come on! Ranking them by price is quite close to ranking by performance, which is how it should be. I'd probably put the 5800X ahead of the 10900K/11900K but we know those parts have been overpriced from the start, their pricing can't drop fast enough. 25+ in stock;)

The 10850K, 10700K, and 5800X stand out to me as exceptional deals. Actually except for the 10900K/11900K they are all decent deals.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
However, there are some AI-related extensions slapped on to AVX-512, which - to my knowledge - are more related to AI training than anything else . . .

Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.

Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.

My understanding of AVX was that cracking it once (i.e. from 256b to 2x128b, or 512b to 2x256b) is doable without too much effort, but cracking it twice (512b to 4x128b) is disproportionately more complicated. Might be hearsay, but we'll have to see what "NextMont"/Meteor Lake does.

Imo, Intel should at least backport VNNI instructions and such to 256b width. Wouldn't fully solve the problem, but would at least work as a stopgap measure.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.



My understanding of AVX was that cracking it once (i.e. from 256b to 2x128b, or 512b to 2x256b) is doable without too much effort, but cracking it twice (512b to 4x128b) is disproportionately more complicated. Might be hearsay, but we'll have to see what "NextMont"/Meteor Lake does.

Imo, Intel should at least backport VNNI instructions and such to 256b width. Wouldn't fully solve the problem, but would at least work as a stopgap measure.
my understanding is avx/avx2 is easily crackable because all instructions align at the 128bit point, avx512 this isn't the case and there are instructions that can do things to data at either side of the 128/256/386 bit alignments. ie bit shift across 512bit vector
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
It kind of blows my mind Intel keeps trying to price discriminate through instruction extensions. It's like they don't actually understand why x86 is the golden goose it is.

AVX-512 should be in most if not all of their designs. Each CPU doesn't have to be able to execute it quickly, but it needs to be able to. Disabling the extension entirely is shortsighted - the advantage of x86 is lost because developers have to support another code path - or drop support for lower-end CPUs. I realize a lot of this is alleviated with modern compilers, but at that point you can also cross-compile for other architectures as well.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,859
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.
There is AVX-VNNI support in hybrid mode, though I won't pretend to know how this relates to AVX512-VNNI (whether complementary or close enough to an effective replacement).
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
There is AVX-VNNI support in hybrid mode, though I won't pretend to know how this relates to AVX512-VNNI (whether complementary or close enough to an effective replacement).

Very likely it is good old AVX512-VNNI, but operating not on 512bit, but on 256 bit and 128 bit vectors. I have looked up AVX512-VNNI, and it is EVEX encoded instructions with 3 registers, not touching mask registers etc, so they have dropped ZMM support, and ended up with AVX-VNNI, encoded by VEX instead ?
1623135124567.png
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.

That's sort of what I was getting at, but thanks for pointing out VNNI and inference (I had forgotten the term, honestly). Stuff like bfloat16 is of more interest to those who are training while commercial ML applications are more reliant on things like VNNI. The fact that VNNI is attached to AVX-512 (but not really!) is problematic.

As @coercitiv points out, Gracemont in particular does support AVX-VNNI:


Intel’s small low-power cores for client system-on-chips have always featured rather decent functionality, but have never supported the instructions required for various high-performance computing or media encoding/decoding workloads to minimize their sizes and power consumption. This is going to change with upcoming Gracemont cores that will support AVX, AVX2, and AVX-VNNI instructions.

Which is really confusing since Intel has just changed the AVX standard. VNNI was originally part of the AVX-512 standard, and my assumption (until now) was that in order to support VNNI at all, you had to support AVX-512 including all its 512b vector glory etc. VNNI is probably one of the instruction subsets supported by Tiger Lake that may see the most use in actual commercial applications.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Which is really confusing since Intel has just changed the AVX standard. VNNI was originally part of the AVX-512 standard, and my assumption (until now) was that in order to support VNNI at all, you had to support AVX-512 including all its 512b vector glory etc.

Perhaps AVX VNNI and AVX512 VNNI are two different things. Thats because, as you correctly said, AVX512 VNNI instructions are only defined in conjunction with 512 bit registers. In addition, as far as i remember, it is required that if you implement any (of the many) AVX512 extensions, you need at least implement AVX512F.

VNNI is probably one of the instruction subsets supported by Tiger Lake that may see the most use in actual commercial applications.
"most use" is relative. Given the fragmented state of all the different AVX512 extensions - i believe that there is hardly any commercial application, which is using AVX512. Even Intel's own Embree library is using mostly AVX or SSE.
 
Last edited: