• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 453 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Diamond Member
Oct 1, 2010
9,339
1,887
136
Thanks that is helpful, although that is slightly different from the question (disabling something is different than not supporting something).
I imagine it's not physically disabled, just that the chip won't report that it supports AVX-512 when the small cores are enabled.
 

coercitiv

Diamond Member
Jan 24, 2014
4,363
5,698
136
IIRC Ian was told that AVX-512 was physically removed from the Sunny Cove core in Lakefield. I don't know if that's really true though.
Ian was told it was removed, yet it was still there as per his own reporting.
Knowing that Lakefield was going to have to take the lowest common denominator from the two core designs, Intel probably should physically removed the very bulky AVX-512 unit from the Sunny Cove core. Looking at the die shot, it's still there - there was some question going into the recent disclosures as to whether it would still be there, but Intel has stated on the record repeatedly that they removed it. The die shot of the compute silicon shows that not to be the case.
 

dullard

Elite Member
May 21, 2001
22,801
1,035
126
I imagine it's not physically disabled, just that the chip won't report that it supports AVX-512 when the small cores are enabled.
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.

There are three fundamental questions:
1) Is the AVX-512 hardware there?
2) Is the AVX-512 hardware functional?
3) If both #1 and #2 are true, can it be set on the fly or only upon bootup in BIOS?
 

jpiniero

Diamond Member
Oct 1, 2010
9,339
1,887
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.
Presumably so, but Windows barely works as it is. Intel probably figures it's too risky.
 

DrMrLordX

Lifer
Apr 27, 2000
17,268
6,267
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.
In that case the scheduler would have to lock the entire workload and all its threads only to the Golden Cove cores out of the possibility that the application could spawn threads executing AVX-512 instructions without the scheduler being able to accurately predict when or where that might happen.

And on what basis would the scheduler make the determination that any given application requires AVX-512?
 

dullard

Elite Member
May 21, 2001
22,801
1,035
126
In that case the scheduler would have to lock the entire workload and all its threads only to the Golden Cove cores out of the possibility that the application could spawn threads executing AVX-512 instructions without the scheduler being able to accurately predict when or where that might happen.

And on what basis would the scheduler make the determination that any given application requires AVX-512?
That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.

I do not know the answer to your question. I can speculate though. For example, the easiest possible solution would be to have a list of known software that doesn't use AVX-512 (such as Windows) that runs only on the smaller cores leaving the bigger cores cold and fully available for the rest of the software. The smaller cores could provide near-instant responsiveness to the end user. The larger cores do the grunt work as needed.
 

jpiniero

Diamond Member
Oct 1, 2010
9,339
1,887
136
That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.
There isn't anything to fix. Intel will eventually come up with a solution that doesn't require OS support but that isn't in Alder Lake. Or AVX-512 will be added to the small cores.
 

Thala

Golden Member
Nov 12, 2014
1,247
557
136
If that is the case, then we are back to where we started: scheduling changes could be implemented so that the chip can report it or not based on the workload.
Lol, thats not how it works. It is essentially the current core (and not the chip) reporting its capabilities to the application (and not to the OS, mind you?). And based on this report, the application is choosing a code path - and this typically happens a single time when the application starts and not anymore thereafter.

That is why I think the scheduler isn't a done deal solved years ago (unlike what Thala keeps claiming here). There are challenging issues left.
Just stop posting wild speculations, in particular when it gets crystal clear, that you have no idea, what you are talking about (see above).
 
Last edited:

jpiniero

Diamond Member
Oct 1, 2010
9,339
1,887
136
Lol, thats not how it works. It is essentially the current core (and not the chip) reporting its capabilities to the application (and not to the OS, mind you?). And based on this report, the application is choosing a code path - and this typically happens a single time when the application starts and not anymore thereafter.
They could implement something like OP mentioned where an application could be forced to run on the Big Cores only and get AVX-512 support. But they won't, at least officially and definitely in Windows.
 

Shivansps

Diamond Member
Sep 11, 2013
3,254
856
136
There isn't anything to fix. Intel will eventually come up with a solution that doesn't require OS support but that isn't in Alder Lake. Or AVX-512 will be added to the small cores.
Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.
 

DrMrLordX

Lifer
Apr 27, 2000
17,268
6,267
136
How useful exactly is AVX512?
Depends on what you're using it for. If you're doing HPC-style work that wants wide vectors then it can definitely improve your application performance. That's the sort of workload you would expect people to throw at GPGPU and/or server-style CPUs. Not really the target application group for consumer CPUs like Alder Lake. However, there are some AI-related extensions slapped on to AVX-512, which - to my knowledge - are more related to AI training than anything else . . . but Intel has been particular about including those extensions in CPUs like Tiger Lake. It may be that they have uses in consumer ML applications. You may have noticed a wide variety of other CPU/SoC vendors boasting of their ML capabilities for similar reasons.

Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.
That might be the most elegant solution. But it won't help Alder Lake since Gracemont is pretty much set-in-stone by now. Also, as I mentioned above, AVX-512 is more than just wider vectors.
 
  • Like
Reactions: Tlh97

Hulk

Diamond Member
Oct 9, 1999
3,015
460
126
In spite of all of our endless discussion as to which CPU is better than which, you've gotta love the free market because in the end it figures it out.

Microcenter pricing as of 6/7/2021
10700 - $220
10700K - $250
10850K - $300 (this is a really good deal I think)
10900 - $330
11700K - $350
5800X - $370
11900K/10900K - $500
5900X - $550
5950X - $800

I mean come on! Ranking them by price is quite close to ranking by performance, which is how it should be. I'd probably put the 5800X ahead of the 10900K/11900K but we know those parts have been overpriced from the start, their pricing can't drop fast enough. 25+ in stock;)

The 10850K, 10700K, and 5800X stand out to me as exceptional deals. Actually except for the 10900K/11900K they are all decent deals.
 
Last edited:

Exist50

Senior member
Aug 18, 2016
276
304
136
However, there are some AI-related extensions slapped on to AVX-512, which - to my knowledge - are more related to AI training than anything else . . .
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.

Im thinking out loud here, but maybe AVX-512 support in a similar way to how AMD implemented AVX-256 on Zen 1 with two AVX-128 units.
My understanding of AVX was that cracking it once (i.e. from 256b to 2x128b, or 512b to 2x256b) is doable without too much effort, but cracking it twice (512b to 4x128b) is disproportionately more complicated. Might be hearsay, but we'll have to see what "NextMont"/Meteor Lake does.

Imo, Intel should at least backport VNNI instructions and such to 256b width. Wouldn't fully solve the problem, but would at least work as a stopgap measure.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,210
1,817
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.



My understanding of AVX was that cracking it once (i.e. from 256b to 2x128b, or 512b to 2x256b) is doable without too much effort, but cracking it twice (512b to 4x128b) is disproportionately more complicated. Might be hearsay, but we'll have to see what "NextMont"/Meteor Lake does.

Imo, Intel should at least backport VNNI instructions and such to 256b width. Wouldn't fully solve the problem, but would at least work as a stopgap measure.
my understanding is avx/avx2 is easily crackable because all instructions align at the 128bit point, avx512 this isn't the case and there are instructions that can do things to data at either side of the 128/256/386 bit alignments. ie bit shift across 512bit vector
 

podspi

Golden Member
Jan 11, 2011
1,955
59
91
It kind of blows my mind Intel keeps trying to price discriminate through instruction extensions. It's like they don't actually understand why x86 is the golden goose it is.

AVX-512 should be in most if not all of their designs. Each CPU doesn't have to be able to execute it quickly, but it needs to be able to. Disabling the extension entirely is shortsighted - the advantage of x86 is lost because developers have to support another code path - or drop support for lower-end CPUs. I realize a lot of this is alleviated with modern compilers, but at that point you can also cross-compile for other architectures as well.
 

coercitiv

Diamond Member
Jan 24, 2014
4,363
5,698
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.
There is AVX-VNNI support in hybrid mode, though I won't pretend to know how this relates to AVX512-VNNI (whether complementary or close enough to an effective replacement).
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,119
979
136
There is AVX-VNNI support in hybrid mode, though I won't pretend to know how this relates to AVX512-VNNI (whether complementary or close enough to an effective replacement).
Very likely it is good old AVX512-VNNI, but operating not on 512bit, but on 256 bit and 128 bit vectors. I have looked up AVX512-VNNI, and it is EVEX encoded instructions with 3 registers, not touching mask registers etc, so they have dropped ZMM support, and ended up with AVX-VNNI, encoded by VEX instead ?
1623135124567.png
 

DrMrLordX

Lifer
Apr 27, 2000
17,268
6,267
136
Unfortunately, a lot of those instructions (VNNI) are used for inference. Which strikes me as the biggest issue of dropping AVX-512 support. It's not the 512b vector length per se, but the other instructions bundled in.
That's sort of what I was getting at, but thanks for pointing out VNNI and inference (I had forgotten the term, honestly). Stuff like bfloat16 is of more interest to those who are training while commercial ML applications are more reliant on things like VNNI. The fact that VNNI is attached to AVX-512 (but not really!) is problematic.

As @coercitiv points out, Gracemont in particular does support AVX-VNNI:


Intel’s small low-power cores for client system-on-chips have always featured rather decent functionality, but have never supported the instructions required for various high-performance computing or media encoding/decoding workloads to minimize their sizes and power consumption. This is going to change with upcoming Gracemont cores that will support AVX, AVX2, and AVX-VNNI instructions.
Which is really confusing since Intel has just changed the AVX standard. VNNI was originally part of the AVX-512 standard, and my assumption (until now) was that in order to support VNNI at all, you had to support AVX-512 including all its 512b vector glory etc. VNNI is probably one of the instruction subsets supported by Tiger Lake that may see the most use in actual commercial applications.
 

Thala

Golden Member
Nov 12, 2014
1,247
557
136
Which is really confusing since Intel has just changed the AVX standard. VNNI was originally part of the AVX-512 standard, and my assumption (until now) was that in order to support VNNI at all, you had to support AVX-512 including all its 512b vector glory etc.
Perhaps AVX VNNI and AVX512 VNNI are two different things. Thats because, as you correctly said, AVX512 VNNI instructions are only defined in conjunction with 512 bit registers. In addition, as far as i remember, it is required that if you implement any (of the many) AVX512 extensions, you need at least implement AVX512F.

VNNI is probably one of the instruction subsets supported by Tiger Lake that may see the most use in actual commercial applications.
"most use" is relative. Given the fragmented state of all the different AVX512 extensions - i believe that there is hardly any commercial application, which is using AVX512. Even Intel's own Embree library is using mostly AVX or SSE.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
17,268
6,267
136
Perhaps AVX VNNI and AVX512 VNNI are two different things. Thats because, as you correctly said, AVX512 VNNI instructions are only defined in conjunction with 512 bit registers. In addition, as far as i remember, it is required that if you implement any (of the many) AVX512 extensions, you need at least implement AVX512F.
The only information I can find about AVX-VNNI so far is this:


Most other search results seem to reference AVX512-VNNI


"most use" is relative. Given the fragmented state of all the different AVX512 extensions - i believe that there is hardly any commercial application, which is using AVX512. Even Intel's own Embree library is using mostly AVX or SSE.
Well, it seems that OpenVino supports AVX512-VNNI. But I don't know that counts as a commercial application. Also TensorFlow, PyTorch, and uhhhh something something I dunno.
 

IntelUser2000

Elite Member
Oct 14, 2003
7,497
2,279
136
My understanding of AVX was that cracking it once (i.e. from 256b to 2x128b, or 512b to 2x256b) is doable without too much effort, but cracking it twice (512b to 4x128b) is disproportionately more complicated. Might be hearsay, but we'll have to see what "NextMont"/Meteor Lake does.
Also, why wouldn't Gracemont have 256-bit FP units? Because back with Goldmont it had full 128-bit FP units. One thing they mentioned was "vector performance".

The client versions of Icelake/Tigerlake doesn't have full 512-bit support either.
 

ASK THE COMMUNITY