Discussion Intel current and future Lakes & Rapids thread

Abwx · Sep 22, 2023

H433x0n said:
I'm not going to really argue any of your points since I pretty much agree. Looking at the Phoronix benchmarks of SPR performance, if you werre to add an additional 17% it gets pretty close to Epyc 9554 but that's still a far cry from the 96-core 9654.

But to get those 17% you ll have to increase wattage by 40-45% or so, and even at stock perf/watt is at a disadvantage in respect of the 9554 due to Epyc s node advantage...

Geddagod · Sep 22, 2023

nicalandia said:
Higher 1T means nothing on Server grade products, SPR falls on its face once they hit its Max TDP. Thats why stock Xeon W9 are on par with Zen3 Thread Rippers on general MT workloads, I suspect that due to Higher L3$ and better Topology EMR will be 15% higher at same core count and TDP when compared to SPR. That is progress I suppose, but AMD with Genoa/Genoa-X/Bergamo are just too ahead on Core/Performance/TCO.

While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw

H433x0n · Sep 22, 2023

Abwx said:
But to get those 17% you ll have to increase wattage by 40-45% or so, and even at stock perf/watt is at a disadvantage in respect of the 9554 due to Epyc s node advantage...

No, it's 17% more general compute performance at iso power (350W). I'm not claiming it's amazing but it's unambiguously more performant than SPR.

Hitman928 · Sep 22, 2023

Geddagod said:
While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw

There's a difference between 1T performance and per core performance. No one cares about 1T performance on server CPUs, but many customers do care about how each core performs once the CPU is loaded.

Geddagod · Sep 22, 2023

Hitman928 said:
There's a difference between 1T performance and per core performance. No one cares about 1T performance on server CPUs, but many customers do care about how each core performs once the CPU is loaded.

I finally understand the "ummm akshually" meme now lol

H433x0n · Sep 22, 2023

Geddagod said:
While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw

Everybody always has an issue when I link the Phoronix test results. Going by their results, SPR is within 25% of Genoa in their geometric mean of all results and outperforms Milan. I genuinely don't know how to interpret their findings though.

Geddagod · Sep 22, 2023

H433x0n said:
Everybody always has an issue when I link the Phoronix test results. Going by their results, SPR is within 25% of Genoa in their geometric mean of all results and outperforms Milan. I genuinely don't know how to interpret their findings though.

The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.

H433x0n · Sep 22, 2023

Geddagod said:
The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.

In that case then they're beating Milan in every metric? All of these results are exact opposite of what I've heard online where Milan beats SPR and Genoa nearly doubles SPR performance.

It may seem that I’m trolling but I’m not. If those are legitimate / representative benchmarks then the tech enthusiast internet is a crazy place.

dullard · Sep 22, 2023

Geddagod said:
The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.

The mathematically correct way to compare multiple benchmarks is a geometric mean, not just a simple average. That said, you are correct that a geometric mean of a mixture of irrelevant tests is not useful. If you care about MT, then do a geometric mean of just the MT tests. If you care about AI, then do a geometric mean of just the AI tests.

cebri1 · Sep 22, 2023

H433x0n said:
No, it's 17% more general compute performance at iso power (350W). I'm not claiming it's amazing but it's unambiguously more performant than SPR.

Yep. We'll see if it translates into real world workloads.

nicalandia · Sep 22, 2023

H433x0n said:
In that case then they're beating Milan in every metric?

No, only on AVX-512 And specific accelerators, otherwise Milan will outperform SPR.

Epyc Milan vs Sapphire Rapids at ISO Core OpenBenchmarking Results

H433x0n · Sep 22, 2023

nicalandia said:
No, only on AVX-512 And specific accelerators, otherwise Milan will either match or beat SPR.

… AVX512 and accelerators are part of SPR though? If AVX512 and AMX provide a big enough uplift allowing it to take the win in Phoronix’s test suite for HPC, Rendering, Code Compilation, etc that’s an accurate representation of what a buyer would receive if they ordered SPR.

nicalandia · Sep 22, 2023

H433x0n said:
… AVX512 and accelerators are part of SPR though? If AVX512 and AMX provide a big enough uplift allowing it to take the win in Phoronix’s test suite for HPC, Rendering, Code Compilation, etc that’s an accurate representation of what a buyer would receive if they ordered SPR.

But AVX512/AMX is not making any difference in many of those test(HPC, Rendering and Code Compilation) because the developers have yet to implement it, heck even AVX-512 has hardly been implemented.

Those are at ISO core count so Apples to Apples, perhaps With EMR they will be able to beat Milan?

H433x0n · Sep 22, 2023

nicalandia said:
But AVX512/AMX is not making any difference in many of those test(HPC, Rendering and Code Compilation) because the developers have yet to implement it, heck even AVX-512 has hardly been implemented.

View attachment 86127

View attachment 86128

View attachment 86129

View attachment 86130

Those are at ISO core count so Apples to Apples, perhaps With EMR they will be able to beat Milan?

According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.

Markfw · Sep 22, 2023

H433x0n said:
According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.

But what happens when it is pitted against its competition, which is not Milan ?

nicalandia · Sep 22, 2023

Markfw said:
But what happens when it is pitted against its competition, which is not Milan ?

Don't. You will get a few people upset here. We all know how weak SPR looks against Genoa.

H433x0n · Sep 22, 2023

Markfw said:
But what happens when it is pitted against its competition, which is not Milan ?

It loses.

I’m not exactly an SPR cheerleader, I’m not advocating anybody to buy it.

nicalandia said:
Don't. You will get a few people upset here. We all know how weak SPR looks against Genoa.

I honestly don’t care. I find a lot of the data center stuff boring and it’s not exactly competitive at the moment between Intel and AMD.

nicalandia · Sep 22, 2023

H433x0n said:
I honestly don’t care. I find a lot of the data center stuff boring and it’s not exactly competitive at the moment between Intel and AMD.

Well Data Center is where they are "More" Competitive due to AVX/AMX because it gets worst at the desktop where their niche accelerators don't help https://www.pugetsystems.com/labs/articles/intel-xeon-w-3400-content-creation-review/

at stock it can't even beat Milan and it's barely ahead of Rome based ThreadRippers.

H433x0n · Sep 22, 2023

nicalandia said:
Well Data Center is where they are "More" Competitive due to AVX/AMX because it gets worst at the desktop where their niche accelerators don't help https://www.pugetsystems.com/labs/articles/intel-xeon-w-3400-content-creation-review/

at stock it can't even beat Milan and it's barely ahead of Rome based ThreadRippers.

To me, that’s essentially a Data Center comparison. Not exactly unexpected results either. To reiterate - I’m not white knighting for SPR.

I was just making a comment that the online discourse about this is wildly overstating the reality of the situation.

JoeRambo · Sep 23, 2023

H433x0n said:
According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.

And that's the problem for Intel currently, such a slice on AMD is cherry picked, but pretty much impossible to make it perform on Intel.

1) Without on-socket NUMA, such tenant of 8C size does not have enough on core resources and any memory traffic is basically a L3 cache miss and looong trip to memory.
2) With quadrant mode, such tenant of 8C is awkwardly sized for 14C slice and has even more miniscule L3, mem bandwidth cut to 2 channel. Not a right place to be and memory subsystem performance is still hilariously bad for 2023.

Compare to AMD where such tenant is has full chiplet just for themselves, backed by their excellent L3 slice of 32MB and 96MB with X3D. And all L3 misses share same IOD with 12 channels.

EMR is fixing most of it by providing larger slice, way more L3 for "half" mode. So fortunes of Intel from performance side core vs core will improve quite some.
They won't be able to touch AMD's perf peaks obviously, but there are so called "licenzing bastards", where software vendor shenanigans are forcing orgs to buy certain sized servers anyway -> like 32C. Intel will be much more competitive in those if the price is right.

Henry swagger · Sep 24, 2023

Intel Meteor Lake & Arrow Lake CPUs Might Feature Similar AI-Dedicated VPU Capabilities

Intel's Meteor Lake and Arrow Lake might feature a similar VPU for AI capabilities, suggesting that Intel plans for early-stage adoption.

wccftech.com

Arrow desktop cpu's will get a vpu also ✍️

lightisgood · Sep 25, 2023

TSMC Moving Towards "Aggressive" Expansion of CoWoS Packaging Facilities For AMD & NVIDIA AI Chips

TSMC has bumped up orders for advanced CoWoS packaging equipment amid the influx of huge AI demand from tech companies such as NVIDIA & AMD.

wccftech.com

Gaudi3 isn't going to use CoWoS but EMIB ?
I'm surprised at EMIB performance because Gaudi3 must equip itself with HBM3e.

AMDK11 · Sep 26, 2023

Doesn't this slide say up to 288 cores on the 2S?

nicalandia · Sep 26, 2023

AMDK11 said:
Doesn't this slide say up to 288 cores on the 2S?

Yes. 2S = 2P so two socket. 2S SF 288c/288T vs 2S Bergamo 256c/512T.
Few here got too carried away br OMG 288 Core Sierra Forest. Well its like the previous leaks. Only 144 per socket and... get this: 36 Quad Clusters each with 3 MiB of L3. Same internal ring per cluster. Good luck with that.

SlimFan · Sep 26, 2023

nicalandia said:
Yes. 2S = 2P so two socket. 2S SF 288c/288T vs 2S Bergamo 256c/512T.
Few here got too carried away br OMG 288 Core Sierra Forest. Well its like the previous leaks. Only 144 per socket and... get this: 36 Quad Clusters each with 3 MiB of L3. Same internal ring per cluster. Good luck with that.

View attachment 86354

That's SP (2 socket times 144). Then there is AP. This was the Hot Chips presentation. Then the CEO showed off AP which is clearly a different package that had two die together. That makes 288 within a package.

It's really pretty straight forward.

So if AP supports 2 socket...

Discussion Intel current and future Lakes & Rapids thread

Lifer

Golden Member

Golden Member

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Elite Member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Senior member

Senior member

Attachments

Senior member

Diamond Member

Member