Discussion Intel current and future Lakes & Rapids thread

nicalandia · Jan 10, 2023

OpenVino is the Absolute best case scenario to display Intel's new AMX SIMD

I was able to make a comparison between AMD 2S 9654 With AVX-512 Enabled and Disabled and made a comparison with Intel 2S 8490H In OpenVINO which takes advantage of AVX2/AVX-512 and now AMX.

The Gains with AMX are in some instances absurd like this chart

But the Geomean brings it about 36% Higher

Also the AVX-512 Performance of Genoa is Also Amazing

This is the Link of the performance Comparison.
AMD Genoa AVX-512 ON/OFF vs Intel SPR-SP AMX

Its good that AMD was able to add AVX-512 Otherwise the performance difference on OpenVINO would have looked rather Silly.

Carfax83 · Jan 10, 2023

nicalandia said:
Performance is allover the place, but what surprises me is the AVX-512 Performance, AMD is beating them really bad.

View attachment 74355

SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

Hitman928 · Jan 10, 2023

Carfax83 said:
SPR has a much stronger AVX-512 implementation than Genao. The reason for that performance result is mostly due to memory bandwidth Genao's 12 channel memory controller vs SPR's 8 channel memory controller.

The author even said it himself in his conclusion, "But for software tasks like code compilation or CPU-based 3D rendering where the outright core/thread count is most pressing and/or memory bandwidth intensive where the 12 channel memory shines, the AMD Genoa parts easily lead."

One AVX-512 test, which doesn't have AMX optimization and isn't memory bandwidth intensive is the chess test and Intel comes out ahead here despite being well behind in core count. AMD's AVX-512 implementation is very good for being non native, but lets not get carried away here.

The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.

Exist50 · Jan 11, 2023

bsp2020 said:
BTW, does anybody know if Intel's accelerator features (QAT, DSA, DLB and IAA) are something you can add to AMD platform using a discrete accelerator card?

Looks like many of them are tightly coupled to memory, so probably not possible. Maybe the encryption stuff though?

bsp2020 said:
I'm not as concerned about AI acceleration since any serious AI training & inferencing still requires NVidia GPUs. Just curious.

Training, largely true, but inference is surprisingly dominated by CPUs. The GPU advantage diminishes until tight latency constraints or small batch sizes, and CPUs are obviously more flexible if you're not going to be doing inference all the time.

For training, GPUs are great if your model fits in VRAM (or is amenable to being streamed in), but for really, really large models, they sometimes have to fall back to CPUs.

Jim Keller remarked on this in talk once. I skipped to the relevant part, but the whole thing is worth a watch. Timestamp at 42:47, if the media embed messes it up.

TLDW: He estimated at the time (about 3 years ago) that AI was something like 80% CPU, 20% GPU, 0% other, and says that if things moved quickly, it would be something like 1/3 each in 5 years, but things probably wouldn't move that quickly.

IntelUser2000 · Jan 11, 2023

Intel is claiming just 15% perf/clock gain on the core over the predecessor with Sapphire Rapids, which is interesting.

Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.

In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.

Exist50 · Jan 11, 2023

IntelUser2000 said:
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.

Pics for anyone else who's curious.

Carfax83 · Jan 11, 2023

Hitman928 said:
The chess result is because the higher core count of the Genoa part you are showing doesn’t help at all in this benchmark. In the tests Phoronix published, the 32 core Genoa processor is at the top of the chart thanks to its higher boost clocks. Looks like this test doesn’t really scale beyond about 64 - 70 cores.

My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.

BorisTheBlade82 · Jan 11, 2023

Has anyone already found specific numbers about the Bandwidth of the EMIB Interconnects between the tiles?
I am interested in bandwidth per edge or per mm Beachfront.

Henry swagger · Jan 11, 2023

https://twitter.com/x/status/1613009833187500032

Impressive outshipping the competition with ease

uzzi38 · Jan 11, 2023

Saylick said:
True, there's always going to be bulk sales discounts to hyperscalers, but the MSRPs don't look favorable in a heads to heads comparison.

Note: I revised my earlier post. It's closer to 1.4x in MSRP difference, not >1.5x, for flagship vs flagship. $17k vs. $12k. For similar core counts, 64C Genoa vs 60C SPR, it's $17k vs. $9k.

STH talked about it a while back but Hyperscalers get huge discounts, we're talking about getting 64c Rome/Milan for ~$2-3k. Don't pay any attention to list pricing.

Markfw · Jan 11, 2023

Henry swagger said:
https://twitter.com/x/status/1613009833187500032
Impressive outshipping the competition with ease

Well, that may be, but every quarter they are losing market share..

Source https://www.hardwaretimes.com/amds-...-the-first-time-ryzen-processor-share-to-dip/

Timorous · Jan 11, 2023

Carfax83 said:
My point was though that AMD's AVX-512 implementation while good, is not native and loses to SPR when extenuating factors aren't in play, ie memory bandwidth, superior core count etcetera. I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.

A single benchmark does not prove this statement.

moinmoin · Jan 11, 2023

IntelUser2000 said:
In other news, Asus in collaboration with Intel on their Zenbook Pro 16X introduced what they call "SoM" or "System on a Module" design that creates a single module out of Raptorlake mobile chip plus LPDDR5X memory modules. It saves 38% in area over the previous generation, allowing for a better GPU to be installed.

Exist50 said:
Pics for anyone else who's curious.

View attachment 74392

View attachment 74393

If more ODMs pick that up we may actually see Apple M series style packages down the line for which Intel and AMD could optimize their chips for. That would not be a bad development.

Carfax83 said:
I'm talking specifically about the actual implementation. @nicalandia implied that AMD's AVX512 implementation was somehow better than Intel's and it's not.

Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2. Now I don't known SPR's behavior and power consumption in this particular case, but I would be surprised to see it deviate from what we've seen before.

Dayman1225 · Jan 11, 2023

This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor

One key chore was “validation,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and catch bugs. Once flaws are found and fixed, designs may go back to the factory to make new test chips, which typically takes more than a month.
Repeating that process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.

She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations. Finding bugs before they are in sample chips is less expensive, and would have made it possible to remove features to simplify the product, Ms. Rivera said. She has since moved to bolster Intel’s simulation and validation abilities.
“We used to have a lot of this kind of muscle that we let atrophy,” Ms. Rivera said. “Now we’re rebuilding.”

eek2121 · Jan 11, 2023

Intel is doing itself a disservice by selling SPR at that price. Sure, larger players get discounts. Smaller folks, however do not.

moinmoin said:
AMD already stated several times before that wafers are no longer the bottleneck, substrate is and that's being worked on for quite some time. I just wonder for how long until it's officially called as resolved. Sony announcing that PS5 shortages are officially over may be a hint of things to come.

Wanted to add that I was shocked when I was able to order a PS5 (at MSRP) last month and get it shipped to me 2 days later. Indeed shortages do appear to be coming to and end.

jpiniero · Jan 11, 2023

IntelUser2000 said:
Lot of the SKUs are based on the monolithic, 780mm2, 32-core MCC die.

Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.

nicalandia · Jan 11, 2023

jpiniero said:
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.

It's about built in accelerators and Intel On Demand, Only XCC has full access to all accelerators.

Exist50 · Jan 11, 2023

moinmoin said:
Afaik the previous discussion @nicalandia implied was not about performance per se but stability of power consumption and (as a result of that) of operating frequency. Usage of Intel's AVX-512 was known to reduce operating frequency due to higher power consumption, which doesn't happen with AMD's implementation of AVX-512 where AVX-512 is actually using slightly less power than AVX2.

A slight clock penalty isn't meaningful if it only happens when you're running significantly more ops/cycle.

Exist50 · Jan 11, 2023

Dayman1225 said:
This article by the NYTimes essentially confirms what we heard about Intels verification/validation/debug teams being layoff’d or leaving in droves years ago

Inside Intel’s Delays in Delivering a Crucial New Microprocessor

Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed". And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...

jpiniero · Jan 11, 2023

nicalandia said:
Intel On Demand

Almost forgot about that... did Intel say what they are officially locking behind the paywall? There was speculation that Intel would lock AMX but that doesn't appear to be the case (?)

nicalandia · Jan 11, 2023

jpiniero said:
Almost forgot about that... did Intel say what they are officially locking behind the paywall? There was speculation that Intel would lock AMX but that doesn't appear to be the case (?)

Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward

We deep-dive into the 4th Generation Intel Xeon Scalable processors, codenamed Sapphire Rapids, to see a huge leap in Xeon capabilities

www.servethehome.com

So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX

Exist50 · Jan 11, 2023

jpiniero said:
Those madlads are actually serious about that. Although you can get it cut down all the way down to 8 cores. The XCC you can get down to 16 cores (so 4 cores per tile). Problem is I don't know how much those SKUs make sense.

Historically, those low end SKUs are basically "I need PCIe lanes for a file server" kind of offerings.

Exist50 · Jan 11, 2023

nicalandia said:
Yes, they are set on Intel On Demand for the Built in Accelerators on die, but they are not locking the AVX-51 Nor the AMX. So whatever is part of the core is fully available

From STH Article

View attachment 74418

4th Gen Intel Xeon Scalable Sapphire Rapids Leaps Forward

We deep-dive into the 4th Generation Intel Xeon Scalable processors, codenamed Sapphire Rapids, to see a huge leap in Xeon capabilities

www.servethehome.com

So from that we can assume that the Xeon W9 will come with the ability to unlock those features but will be turned off at the microcode level. Just AVX-512 And AMX

The wording there is a bit unclear, but it sounds like they're saying you get whatever's on the box, enabled at purchase time, but the disabled/binned accelerators might be reenabled via On Demand. But I'm not sure if that interpretation is correct.

coercitiv · Jan 11, 2023

Exist50 said:
Huh, that's a really weird article. They phrase it as some exposé, but the whole thing is basically "there were bugs and it was delayed".

It's a damage control piece: bad things happened, but we're already fixing it.

Exist50 said:
And why are they hyping up Rivera so much? She's on the business side, not the engineering side, and they even point out she was working in HR for most of SPR's lifecycle. The only thing they really say she did was set up meetings...

Intel's people made decisions that led to this failure. Now the "new" Intel is trying to prove that "new" executives know how to turn the ship around.

So think of Ms. Rivera as someone speaking on behalf of the company and declaring the following:

we aimed too high with Sapphire Rapids, rather than deliver a less ambitious product sooner
our validation effort was insufficient, had we picked up on some of the issues in simulation we might have dropped features to ship the product on time
we scheduled more products than our engineers can handle

I think these admissions fit well with the info from years ago that @Dayman1225 is talking about.

jpiniero · Jan 11, 2023

Exist50 said:
Historically, those low end SKUs are basically "I need PCIe lanes for a file server" kind of offerings.

That was more Xeon E. You can still buy those but who knows if Intel will refresh it with anything beyond Rocket Lake.

One chiplet Epyc probally makes more sense.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Elite Member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Lifer