Discussion Intel current and future Lakes & Rapids thread

Page 735 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
16,164
10,240
106
Moving to the cloud has its problems.
We've already wasted almost a year trying to qualify Azure for our core application. Every time something new would come up, throwing a wrench into the migration process. Last time we found out after asking very pointy questions that we would basically be limited to just 5000 IOPS on the VMs and trying to increase the disk throughput would end up costing astronomically because of how the Azure cost structure is set up (you use more, you pay a LOT more than you can imagine). Now they rewrote the application to run on Azure SQL which is supposed to scale up only on demand and thus cost less but I don't know. I have a bad feeling that we will end up losing a lot more than we gain. It's impossible to dissuade the management from the allure of the magic "cloud" word.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
OpenVino is the Absolute best case scenario to display Intel's new AMX SIMD

I was able to make a comparison between AMD 2S 9654 With AVX-512 Enabled and Disabled and made a comparison with Intel 2S 8490H In OpenVINO which takes advantage of AVX2/AVX-512 and now AMX.

The Gains with AMX are in some instances absurd like this chart
1673394404106.png


But the Geomean brings it about 36% Higher


1673394499793.png


Also the AVX-512 Performance of Genoa is Also Amazing

This is the Link of the performance Comparison.
AMD Genoa AVX-512 ON/OFF vs Intel SPR-SP AMX

Its good that AMD was able to add AVX-512 Otherwise the performance difference on OpenVINO would have looked rather Silly.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Performance is allover the place, but what surprises me is the AVX-512 Performance, AMD is beating them really bad.

View attachment 74355

SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

Jh6ZiJ.jpg
 

Hitman928

Diamond Member
Apr 15, 2012
5,244
7,793
136
SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

Jh6ZiJ.jpg

The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.

E23CDC8C-2C7F-466A-91D7-20AB2DE77FCA.jpeg
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
BTW, does anybody know if Intel's accelerator features (QAT, DSA, DLB and IAA) are something you can add to AMD platform using a discrete accelerator card?
Looks like many of them are tightly coupled to memory, so probably not possible. Maybe the encryption stuff though?
I'm not as concerned about AI acceleration since any serious AI training & inferencing still requires NVidia GPUs. Just curious.
Training, largely true, but inference is surprisingly dominated by CPUs. The GPU advantage diminishes until tight latency constraints or small batch sizes, and CPUs are obviously more flexible if you're not going to be doing inference all the time.

For training, GPUs are great if your model fits in VRAM (or is amenable to being streamed in), but for really, really large models, they sometimes have to fall back to CPUs.

Jim Keller remarked on this in talk once. I skipped to the relevant part, but the whole thing is worth a watch. Timestamp at 42:47, if the media embed messes it up.


TLDW: He estimated at the time (about 3 years ago) that AI was something like 80% CPU, 20% GPU, 0% other, and says that if things moved quickly, it would be something like 1/3 each in 5 years, but things probably wouldn't move that quickly.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Intel is claiming just 15% perf/clock gain on the core over the predecessor with Sapphire Rapids, which is interesting.

Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.


In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
Pics for anyone else who's curious.

1673420105808.png

1673420115888.png
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.

My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
Has anyone already found specific numbers about the Bandwidth of the EMIB Interconnects between the tiles?
I am interested in bandwidth per edge or per mm Beachfront.
 

uzzi38

Platinum Member
Oct 16, 2019
2,623
5,883
146
True, there's always going to be bulk sales discounts to hyperscalers, but the MSRPs don't look favorable in a heads to heads comparison.

Note: I revised my earlier post. It's closer to 1.4x in MSRP difference, not >1.5x, for flagship vs flagship. $17k vs. $12k. For similar core counts, 64C Genoa vs 60C SPR, it's $17k vs. $9k.
STH talked about it a while back but Hyperscalers get huge discounts, we're talking about getting 64c Rome/Milan for ~$2-3k. Don't pay any attention to list pricing.
 

Timorous

Golden Member
Oct 27, 2008
1,608
2,753
136
My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.

A single benchmark does not prove this statement.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.
Pics for anyone else who's curious.

View attachment 74392

View attachment 74393
If more ODMs pick that up we may actually see Apple M series style packages down the line for which Intel and AMD could optimize their chips for. That would not be a bad development.

I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.
Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2. Now I don't known SPR's behavior and power consumption in this particular case, but I would be surprised to see it deviate from what we've seen before.
 
  • Like
Reactions: ftt

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor

One key chore was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and catch bugs. Once flaws are found and fixed, designs may go back to the factory to make new test chips, which typically takes more than a month.
Repeating that process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.


She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations. Finding bugs before they are in sample chips is less expensive, and would have made it possible to remove features to simplify the product, Ms. Rivera said. She has since moved to bolster Intel’s simulation and validation abilities.
“We used to have a lot of this kind of muscle that we let atrophy,” Ms. Rivera said. “Now we’re rebuilding.”
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Intel is doing itself a disservice by selling SPR at that price. Sure, larger players get discounts. Smaller folks, however do not.
AMD already stated several times before that wafers are no longer the bottleneck, substrate is and that's being worked on for quite some time. I just wonder for how long until it's officially called as resolved. Sony announcing that PS5 shortages are officially over may be a hint of things to come.

Wanted to add that I was shocked when I was able to order a PS5 (at MSRP) last month and get it shipped to me 2 days later. Indeed shortages do appear to be coming to and end.
 

jpiniero

Lifer
Oct 1, 2010
14,585
5,208
136
Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.

Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
It's about built in accelerators and Intel On Demand, Only XCC has full access to all accelerators.
 
Last edited:
  • Like
Reactions: moinmoin

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2.
A slight clock penalty isn't meaningful if it only happens when you're running significantly more ops/cycle.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor
Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed". And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Almost forgot about that... did Intel say what they are officially locking behind the paywall? There was speculation that Intel would lock AMX but that doesn't appear to be the case (?)
Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

1673461862503.png


So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.
Historically, those low end SKUs are basically "I need PCIe lanes for a file server" kind of offerings.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

View attachment 74418


So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX
The wording there is a bit unclear, but it sounds like they're saying you get whatever's on the box, enabled at purchase time, but the disabled/binned accelerators might be reenabled via On Demand. But I'm not sure if that interpretation is correct.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed".
It's a damage control piece: bad things happened, but we're already fixing it.

And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...
Intel's people made decisions that led to this failure. Now the "new" Intel is trying to prove that "new" executives know how to turn the ship around.

So think of Ms. Rivera as someone speaking on behalf of the company and declaring the following:
  • we aimed too high with Sapphire Rapids, rather than deliver a less ambitious product sooner
  • our validation effort was insufficient, had we picked up on some of the issues in simulation we might have dropped features to ship the product on time
  • we scheduled more products than our engineers can handle
I think these admissions fit well with the info from years ago that @Dayman1225 is talking about.