Discussion Intel current and future Lakes & Rapids thread

Page 735 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,781
1,480
136
Performance is allover the place, but what surprises me is the AVX-512 Performance, AMD is beating them really bad.

View attachment 74355
SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

 

Hitman928

Diamond Member
Apr 15, 2012
4,375
5,763
136
SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.

E23CDC8C-2C7F-466A-91D7-20AB2DE77FCA.jpeg
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
BTW, does anybody know if Intel's accelerator features (QAT, DSA, DLB and IAA) are something you can add to AMD platform using a discrete accelerator card?
Looks like many of them are tightly coupled to memory, so probably not possible. Maybe the encryption stuff though?
I'm not as concerned about AI acceleration since any serious AI training & inferencing still requires NVidia GPUs. Just curious.
Training, largely true, but inference is surprisingly dominated by CPUs. The GPU advantage diminishes until tight latency constraints or small batch sizes, and CPUs are obviously more flexible if you're not going to be doing inference all the time.

For training, GPUs are great if your model fits in VRAM (or is amenable to being streamed in), but for really, really large models, they sometimes have to fall back to CPUs.

Jim Keller remarked on this in talk once. I skipped to the relevant part, but the whole thing is worth a watch. Timestamp at 42:47, if the media embed messes it up.


TLDW: He estimated at the time (about 3 years ago) that AI was something like 80% CPU, 20% GPU, 0% other, and says that if things moved quickly, it would be something like 1/3 each in 5 years, but things probably wouldn't move that quickly.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,578
3,634
136
Intel is claiming just 15% perf/clock gain on the core over the predecessor with Sapphire Rapids, which is interesting.

Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.


In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
Pics for anyone else who's curious.

1673420105808.png

1673420115888.png
 

Carfax83

Diamond Member
Nov 1, 2010
6,781
1,480
136
The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.
My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.
 

BorisTheBlade82

Senior member
May 1, 2020
381
543
106
Has anyone already found specific numbers about the Bandwidth of the EMIB Interconnects between the tiles?
I am interested in bandwidth per edge or per mm Beachfront.
 

uzzi38

Platinum Member
Oct 16, 2019
2,391
5,025
116
True, there's always going to be bulk sales discounts to hyperscalers, but the MSRPs don't look favorable in a heads to heads comparison.

Note: I revised my earlier post. It's closer to 1.4x in MSRP difference, not >1.5x, for flagship vs flagship. $17k vs. $12k. For similar core counts, 64C Genoa vs 60C SPR, it's $17k vs. $9k.
STH talked about it a while back but Hyperscalers get huge discounts, we're talking about getting 64c Rome/Milan for ~$2-3k. Don't pay any attention to list pricing.
 

Timorous

Golden Member
Oct 27, 2008
1,137
1,722
136
My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.
A single benchmark does not prove this statement.
 

moinmoin

Diamond Member
Jun 1, 2017
4,176
6,253
136
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
Pics for anyone else who's curious.

View attachment 74392

View attachment 74393
If more ODMs pick that up we may actually see Apple M series style packages down the line for which Intel and AMD could optimize their chips for. That would not be a bad development.

I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.
Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2. Now I don't known SPR's behavior and power consumption in this particular case, but I would be surprised to see it deviate from what we've seen before.
 
  • Like
Reactions: ftt

Dayman1225

Golden Member
Aug 14, 2017
1,081
791
146
This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor

One key chore was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and catch bugs. Once flaws are found and fixed, designs may go back to the factory to make new test chips, which typically takes more than a month.
Repeating that process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.


She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations. Finding bugs before they are in sample chips is less expensive, and would have made it possible to remove features to simplify the product, Ms. Rivera said. She has since moved to bolster Intel’s simulation and validation abilities.
“We used to have a lot of this kind of muscle that we let atrophy,” Ms. Rivera said. “Now we’re rebuilding.”
 

eek2121

Platinum Member
Aug 2, 2005
2,292
2,996
136
Intel is doing itself a disservice by selling SPR at that price. Sure, larger players get discounts. Smaller folks, however do not.
AMD already stated several times before that wafers are no longer the bottleneck, substrate is and that's being worked on for quite some time. I just wonder for how long until it's officially called as resolved. Sony announcing that PS5 shortages are officially over may be a hint of things to come.
Wanted to add that I was shocked when I was able to order a PS5 (at MSRP) last month and get it shipped to me 2 days later. Indeed shortages do appear to be coming to and end.
 

jpiniero

Lifer
Oct 1, 2010
12,811
4,098
136
Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
 

nicalandia

Platinum Member
Jan 10, 2019
2,955
4,557
136
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
It's about built in accelerators and Intel On Demand, Only XCC has full access to all accelerators.
 
Last edited:
  • Like
Reactions: moinmoin

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2.
A slight clock penalty isn't meaningful if it only happens when you're running significantly more ops/cycle.
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor
Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed". And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...
 

nicalandia

Platinum Member
Jan 10, 2019
2,955
4,557
136
Almost forgot about that... did Intel say what they are officially locking behind the paywall? There was speculation that Intel would lock AMX but that doesn't appear to be the case (?)
Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

1673461862503.png


So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX
 
Last edited:

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
Historically, those low end SKUs are basically "I need PCIe lanes for a file server" kind of offerings.
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

View attachment 74418


So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX
The wording there is a bit unclear, but it sounds like they're saying you get whatever's on the box, enabled at purchase time, but the disabled/binned accelerators might be reenabled via On Demand. But I'm not sure if that interpretation is correct.
 

coercitiv

Diamond Member
Jan 24, 2014
5,375
8,958
136
Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed".
It's a damage control piece: bad things happened, but we're already fixing it.

And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...
Intel's people made decisions that led to this failure. Now the "new" Intel is trying to prove that "new" executives know how to turn the ship around.

So think of Ms. Rivera as someone speaking on behalf of the company and declaring the following:
  • we aimed too high with Sapphire Rapids, rather than deliver a less ambitious product sooner
  • our validation effort was insufficient, had we picked up on some of the issues in simulation we might have dropped features to ship the product on time
  • we scheduled more products than our engineers can handle
I think these admissions fit well with the info from years ago that @Dayman1225 is talking about.
 

jpiniero

Lifer
Oct 1, 2010
12,811
4,098
136
Historically, those low end SKUs are basically "I need PCIe lanes for a file server" kind of offerings.
That was more Xeon E. You can still buy those but who knows if Intel will refresh it with anything beyond Rocket Lake.

One chiplet Epyc probally makes more sense.
 
  • Like
Reactions: moinmoin

ASK THE COMMUNITY