Discussion Intel current and future Lakes & Rapids thread

João Bortolace · May 14, 2021

I think an ADL-P with HBM2 would be interesting, to do like apple M1.

repoman27 · May 14, 2021

Timmah! said:
You think they wont try to compete with 64core TR, when they can? I mean, if only to have the halo-product, to be able to claim they have the fastest CPU and whatnot. Surely they wont be able to say that about 2-tile version.
Regarding HBM, i guess you are right. But do you assume then it will be stacked on top of those tiles and its not part of it?

HBM2E stacks will be adjacent to the tiles and connected via EMIB. 64 GB would be four 16 GB stacks, probably one per tile. I'm guessing each tile will also have a pair of dual-channel DDR5 memory controllers, although a maximum of 8 channels will be exposed per module/package.

edit: I should also point out that Intel is shooting for a Q4'21 PRQ for Sapphire Rapids. That means starting the volume ramp in December for a Q2'22 launch. So I wouldn't expect retail availability of workstation/HEDT parts until the latter half of 2022.

jpiniero · May 14, 2021

Timmah! said:
You think they wont try to compete with 64core TR, when they can?

The tiles are ridiculously large. There's a reason there's been rumors that Intel is keeping Ice Lake Server on the market and SPR is only going to be the high end product.

Timmah! · May 14, 2021

repoman27 said:
HBM2E stacks will be adjacent to the tiles and connected via EMIB. 64 GB would be four 16 GB stacks, probably one per tile. I'm guessing each tile will also have a pair of dual-channel DDR5 memory controllers, although a maximum of 8 channels will be exposed per module/package.

edit: I should also point out that Intel is shooting for a Q4'21 PRQ for Sapphire Rapids. That means starting the volume ramp in December for a Q2'22 launch. So I wouldn't expect retail availability of workstation/HEDT parts until the latter half of 2022.

https://cdn.wccftech.com/wp-content/uploads/2021/04/Intel-Sapphire-Rapids-Xeon-CPU-4-Chiplet-MCM-Design-With-Up-To-80-Cores-_4.jpg

Where do you reckon would those adjacent HBM be in there?

Timmah! · May 14, 2021

jpiniero said:
The tiles are ridiculously large. There's a reason there's been rumors that Intel is keeping Ice Lake Server on the market and SPR is only going to be the high end product.

The tiles seem to have 4x5 configuration. Same as mine HCC Skylake, which is like 4 years old. I know its different node, but surely by now thats not considered "ridiculously large" - what is the monolithic 40-core IceLake then? I thought the whole point of this tile approach was to lower the costs thanks to better yields. If its still considered large and expensive, it kind of misses its purpose, is it not? And why are they not making them smaller, 8-core or so, like AMD does, then?

jpiniero · May 14, 2021

Timmah! said:
If its still considered large and expensive, it kind of misses its purpose, is it not? And why are they not making them smaller, 8-core or so, like AMD does, then?

I think it's like the original Epyc in how it's designed with the only difference being that they are connected via EMIB. So you always need 4 of them, and looking at the leaked picture it looks like at least 250 mm2 each.

repoman27 · May 14, 2021

Timmah! said:
The tiles seem to have 4x5 configuration. Same as mine HCC Skylake, which is like 4 years old. I know its different node, but surely by now thats not considered "ridiculously large" - what is the monolithic 40-core IceLake then? I thought the whole point of this tile approach was to lower the costs thanks to better yields. If its still considered large and expensive, it kind of misses its purpose, is it not? And why are they not making them smaller, 8-core or so, like AMD does, then?

jpiniero is "focused" on yields. The SPR tiles are around 420 mm², whereas ICL-SP XCC 40C is 19.5 mm x 32 mm = 624 mm². We actually don't know the mesh configuration at this juncture. People were looking at the micro-bump fields which are on top of the metal layers and somehow thought they were seeing something that related to FEOL features.

Timmah! said:
https://cdn.wccftech.com/wp-content/uploads/2021/04/Intel-Sapphire-Rapids-Xeon-CPU-4-Chiplet-MCM-Design-With-Up-To-80-Cores-_4.jpg

Where do you reckon would those adjacent HBM be in there?

Looks like there's plenty of room on that substrate to me. They would directly abut the compute tiles and only measure 10 mm x 11 mm. For reference, the compute tiles are ~20.5 mm / side, and the package is 77.6 mm x 54 mm.

edit: Correction, the LGA4677 package size is apparently the same as LGA4189 at 77.5 mm x 56.5 mm.

Timmah! · May 14, 2021

jpiniero said:
I think it's like the original Epyc in how it's designed with the only difference being that they are connected via EMIB. So you always need 4 of them, and looking at the leaked picture it looks like at least 250 mm2 each.

250mm2 would be less than RocketLake. Skylake-X is 484, almost 2x as big. Its not something never seen before.

jpiniero · May 14, 2021

Timmah! said:
250mm2 would be less than RocketLake. Skylake-X is 484, almost 2x as big. Its not something never seen before.

That's on 14 nm though. If the tiles are closer to 420 mm2, that's over 1600 mm2 of 10 nm for one chip.

tomatosummit · May 14, 2021

Timmah! said:
Where do you reckon would those adjacent HBM be in there?

They certainly aren't going on that package. I think hbm srapids is going to be the high end stuff only, finally a real reason for intel to charge and arm and a leg for platinum cpus.
hbm2 modules are less than half the size of the revealed chiplets so fitting them on that package of that size might not be insurmountable, going by ~300mm^2 chiplets by rough estimate and ~125^2 for memory.
Costs for extra emib on package might mean it makes financial sense for intel to have a wholey separate package for hbm skus or eve a separate socket for hpc customers who're getting hbm.

Am I imagining there was a recent dell workstation roadmap leak that had the various icelake and sapphire rapids cpu lines on it?

eek2121 · May 14, 2021

Gideon said:
@mikk, @eek2121

BTW to reiterate where I stand on Alder Lake. My estimates are:

It will be faster (in all likelyhood 10-20% faster) than Zen 3 in ST or lightly threaded productivity apps

A top-of-the-line 8 + 8 Alder Lake will give Ryzen 5900X a run for it's money in MT performance, not sure how much faster/slower but being all-around in the same range. This means:

8 + 8 will be in very similar balpark while 8 + 0 conf will certainly win in heavily parallel AVX-512 enabled workloads, but in no AVX2 or below workloads that utilize more than 8 threads.

It will not outperform a 5950X on average in MT loads. At least not actual reviewer benchmark suites (say Tom's hardware, Anandtech etc)

In Gaming performance it should eek out Rocket Lake and Zen3 slightly but the difference will not be large (up to 5% on average) as gaming is mostly sensitive to cache and Memory Latency and Cache size and only to a lesser extent IPC. The titles where Zen 3 is doing better than Rocket lake (due to cache) will see the biggest gain going from 16MB -> 24MB of L3. All of This will only happen if all of the issues listed below are solved:

There the gain can be considerably larger than what's stated above. They will need to fix the cache-latency issues that plague Rocket Lake for that to be the case, but that's most likely not a problem as the latter was hampered by a backport to 14nm.

Windows Scheduler will also need updates to not raise any issues in games that scale to > 8 threads. This can be fixed by running in 8 + 0 config.

For clear gaming wins either DDR4 based MOBOs (for best OC latency in the beginning) or very good early DDR5 XMP module availability is requried.

Let's revisit this after reviews to see see how terribly AMD biased Intel hater I end up being in the end.

I still remember people being angry at me for claiming that Rocket Lake might not significantly outperform Zen 3 in gaming (due to aforementioned cache and memory limitations, which certainly ended up being true)

We will see. For the record, I never had faith in Rocket Lake. The reason I think your thinking is flawed is because you aren’t taking into consideration the fact that ADL-S will likely be able to sustain much higher nT boost clocks (compared to rocket lake) thanks to lower power consumption and improved thermals. I personally expect multicore workloads to be drastically improved over rocket lake. Also, DDR5 will double memory bandwidth, which will also show itself in interesting ways (assuming it launches with DDR5)

I suspect it will beat the 5950X by a good 10-15%.

moinmoin · May 14, 2021

jpiniero said:
I think it's like the original Epyc in how it's designed with the only difference being that they are connected via EMIB. So you always need 4 of them, and looking at the leaked picture it looks like at least 250 mm2 each.

The only reason Epyc 1 always consisted of 4 dies is due to AMD positioning the SP3 socket as a platform that always offers 8 memory channels regardless whether it's a top or bottom end chip used (and every die has 2 channels). Intel on the other hand never has been shy to introduce segmentation within a platform.

repoman27 · May 14, 2021

jpiniero said:
That's on 14 nm though. If the tiles are closer to 420 mm2, that's over 1600 mm2 of 10 nm for one chip.

Yeah, it's bananas, when you think about it. When people talk about HBM2E being expensive, I don't think they're fully considering how expensive these chips are going to be. A 16 GB HBM2E stack costs what, $160?

jpiniero · May 14, 2021

moinmoin said:
The only reason Epyc 1 always consisted of 4 dies is due to AMD positioning the SP3 socket as a platform that always offers 8 memory channels regardless whether it's a top or bottom end chip used (and every die has 2 channels). Intel on the other hand never has been shy to introduce segmentation within a platform.

While Intel loves to segment, the number of memory channels isn't something they do. All of Ice Lake has 8 but the lower end has lower memory speed.

Timmah! · May 14, 2021

repoman27 said:
Yeah, it's bananas, when you think about it. When people talk about HBM2E being expensive, I don't think they're fully considering how expensive these chips are going to be. A 16 GB HBM2E stack costs what, $160?

Thats why i wondered whether the HBM will be part of the HEDT line or not.
Anyway, lets assume it will not and as jpiniero says, it will top at 2 tiles - so it will have probably 28/32/36 cores overall. Do you think Intel will price it competitively against 32c TR, which will probably matching it most, performance-wise?

uzzi38 · May 14, 2021

eek2121 said:
We will see. For the record, I never had faith in Rocket Lake. The reason I think your thinking is flawed is because you aren’t taking into consideration the fact that ADL-S will likely be able to sustain much higher nT boost clocks (compared to rocket lake) thanks to lower power consumption and improved thermals. I personally expect multicore workloads to be drastically improved over rocket lake. Also, DDR5 will double memory bandwidth, which will also show itself in interesting ways (assuming it launches with DDR5)

I suspect it will beat the 5950X by a good 10-15%.

I'm assuming you mean you expect ADL-S to hold higher nT clocks than Rocket Lake when PL1 kicks in, because Rocket Lake tends to hold it's max 4.7GHz nT boost when allowed the full PL2 (and sustains 5.1GHz all-core when Adaptive Boost Technology is active alongside MCE).

Still, I wonder about that. Tiger Lake-H actually scores very similarly to Rocket Lake-S with both limited to 35-45W according to both sets of scores from early Chinese reviews we saw earlier, so they're obviously holding similar clocks. In one review RKL-S wins, in the other TGL-H wins, both by small margins though (of course, they're not the same tests, R15 vs R20).

Now considering the fact that ADL-S will likely have higher uncore power (PCIe Gen 5) and more importantly, two 4c Gracemont clusters to power as well. And it's also a much wider core than each Willow Cove core as well. Personally, I don't expect higher clocks on each GLC core within limited power budgets.

As for with unlimited power? Who knows, but 4.7GHz all core is a high bar to beat.

Cardyak · May 14, 2021

uzzi38 said:
And it's also a much wider core than each Willow Cove core as well. Personally, I don't expect higher clocks on each WLC core within limited power budgets.

As for with unlimited power? Who knows, but 4.7GHz all core is a high bar to beat.

Agreed. For Alder Lake I’m banking on roughly a ~4.5Ghz all-core boost (For higher end SKUs), but with a 20% IPC boost.

dullard · May 14, 2021

Cardyak said:
Agreed. For Alder Lake I’m banking on roughly a ~4.5Ghz all-core boost (For higher end SKUs), but with a 20% IPC boost.

Intel generally doesn't lower all-core boost much from generation to generation. I think they'll end up in the 4.6 GHz to 4.7 GHz range for the top Alder Lake all-core turbo. The TDP is based on base clocks, and from that I think the top Alder Lake will have base clocks in the ~3.0 GHz region, maybe a hair higher. I'd be happy with a 20% IPC boost, but I'll plan more on 15%.

The bigger wild card to me is the little cores. If done properly, and Windows puts all the background threads onto the little cores, then that opens a new opportunity. I have nothing but Chrome with one tab open right now and I have over 2500 threads running taking up over 4% of my CPU. Putting that base load onto the small cores means that the big cores are cold and do not need to switch repeatedly between 2500 threads. That alone could give another 5%+ performance boost (or at least a significant more time that the all-core turbo can be enabled) that isn't measured in frequency or IPC statistics. Or Windows can butcher it. Who knows.

IntelUser2000 · May 14, 2021

Dayman1225 said:
God your messages are awfully depressing now. I get it Intel has been Mis-executing and not living to expectations doesn’t mean they ‘shouldn’t be invited’. Intel hotchips program looks to be exciting, it’s their latest a greatest, you want them to show Tigerlake again? 🤷‍♂️

What's your point?

Mine is that they should leave Hot Chips for something actually meaningful. Do you seriously not remember Tigerlake at Hot Chips? They should have gave us the official die pic, or the breakdown, or how it clocked higher(aside from that it just does), or the transistor count, or details on the caches. Do I need to say more?

Timmah! said:
- is that HBM meant to serve as some kind of L4 cache or is it supposed to serve as actual system RAM? 64GB system RAM for servers seems fairly low...and pricing-wise i cant see so much RAM to be part of consumer grade product... so will the HEDT part not have it? Anyway, it would surely provide massive performance boost by itself, would it not?

It won't be cache, if they decide to use HBM on HEDT parts, which I have some doubt on.

HBM will simply act as fast memory. Go look up at Knights Landing.

uzzi38 said:
Still, I wonder about that. Tiger Lake-H actually scores very similarly to Rocket Lake-S with both limited to 35-45W according to both sets of scores from early Chinese reviews we saw earlier, so they're obviously holding similar clocks. In one review RKL-S wins, in the other TGL-H wins, both by small margins though (of course, they're not the same tests, R15 vs R20).

What's your opinion about Notebookcheck's coverage?

Alderlake: I don't think Alderlake will do more than beat 5900X slightly. It might have a chance to be good on Notebooks. Intel will need more Golden Cove or Gracemont cores to beat AMD's top line.

They'll potentially be in a far better situation than today.

uzzi38 · May 14, 2021

IntelUser2000 said:
What's your opinion about Notebookcheck's coverage?

Interesting, as both of their results are vastly higher than both reviews seen thus far.

Review 1 (R15): 11800H 35W cTDP isn't even listed, but NotebookCheck's 11800H that's supposedly set to 35W cTDP scores about the same as 90W according to the chart provided.

R15 is an extremely short test that does complete before PL2 expires, so I do think it's possible that it's boosting above this 35W cTDP throughput the duration of the test. But it's impossible to tell without seeing a freq/time plot.

But I don't think PL2s are controlled here looking at the 11980HK score. Average 10980HK on their site is listed as scoring 1738pts. Provided 11800H scores 15% above that. The 11980HK scores 31% above that. Those are almost certainly out of line vs what Intel claimed for the 11980HK vs the 10980HK which is 19% improvement at locked 45W PL1/PL2.

Same goes for their 5900HS/HX as well. Both are definitely above 35W sustained during R15 and R20 from what I can tell, making Notebookcheck's results difficult to compare. We just don't know the power consumption during any of the tests despite being told the 11800H was locked to 35W cTDP.

Same thing applies to the R20 scores as well. R20 can also complete within PL2, so it's impossible to make efficiency comparisons.

Although the R23 score is extremely surprising. The 11800H is the 35W cTDP part, and it's keeping up with the 5900HS in the G15 in NBC's testing. This is the one test that isn't affected as much by cTDP being a 10 minute test, and actually by all reasoning should favour the AMD system due to the way Ryzen's boost by default. Given the fact that R15 and R20 were both not controlled in terms of power limit, I'm rather surprised to see that R23 shows such similar numbers for the 11800H and 5900HS. This also doesn't line up with expectations actually. So either TGL-H is vastly better than both what Intel and those early Chinese reviews both portrayed, or we need to wait for more benchmarks with more controlled or clear power limits.

ondma · May 15, 2021

IntelUser2000 said:
What's your point?

Mine is that they should leave Hot Chips for something actually meaningful. Do you seriously not remember Tigerlake at Hot Chips? They should have gave us the official die pic, or the breakdown, or how it clocked higher(aside from that it just does), or the transistor count, or details on the caches. Do I need to say more?

It won't be cache, if they decide to use HBM on HEDT parts, which I have some doubt on.

HBM will simply act as fast memory. Go look up at Knights Landing.

What's your opinion about Notebookcheck's coverage?

Alderlake: I don't think Alderlake will do more than beat 5900X slightly. It might have a chance to be good on Notebooks. Intel will need more Golden Cove or Gracemont cores to beat AMD's top line.

They'll potentially be in a far better situation than today.

I will consider it a very good product if 8+8 AL can beat Ryzen in single thread and match or exceed 5900x in multi threaded. I know this forum is all about performance, but really I consider 5900x the top of the line "mainstream" AMD product. 5950x is more of a halo product. Hopefully, top of the line AL will also be significantly cheaper than the 800.00 (if you can find one at MSRP) 5950x.

Hulk · May 15, 2021

Alder Lake is such a mystery at this point. I think we all can agree at this point that Big core for core it will be faster than Rocket Lake. But the effectiveness of the little cores is a huge question mark. It seems reasonable that 8+8 could equal the 5900X but could it somehow equal the 5950X? Performance of Big/Little on the desktop, clockspeeds, Golden Cove/Gracemont IPC, power consumption? All unknowns at this point.

Intel seems to be betting the farm on it so we can only assume they think they have a winner. But then again they bet the farm on Netburst as well so who knows? Or perhaps they're going all in on mobile with Alder Lake and are surrendering high end desktop to AMD?

Finally I know I am wrong on this last point but I just can't wrap my head around the fact that Intel and AMD are going to pull another 20 or 30% throughput increase out of their next gen architecture! I can see maybe another 5% out of DDR5, and maybe another 5% by going wider, smarter/larger internal structures, but they have been tweaking these parts forever. I just don't see what's left on the table to continue to extract more instruction level parallelism out of these designs. I mean how much out-of-order processing can be done before you hit the proverbial wall?

Redfire · May 15, 2021

Warning, there's a lot of assumptions I'm going to be making here:

Assumptions:
Zen 2 IPC = Skylake IPC
Zen 3 IPC = 1.20 Zen 2 IPC
Sunny Cove IPC = 1.20 Skylake IPC
Willow Cove IPC = 1.05 Sunny Cove IPC
Golden Cove IPC = 1.20 WIllow Cove IPC
Gracemont IPC = Skylake IPC

Golden Cove (ADL) All-core = 4.6GHz
Gracemont (ADL) All-core = 3.5 GHz
Zen 3 (5900X) All-core = 4.1 GHz
Zen 3 (5950X) All-core = 3.75 GHz

Alder Lake has a hardware scheduler/Windows scheduler is fixed an optimised for heteregeneous cores. The scheduler does well with balancing Golden Cove and Gracmont Atom cores. (Biggest Assumption right now in my opinion)

Single-threading:
ADL 12900K: 1.20 × 1.05 × 1.00 = 1.260
Zen 3 5900X: 1.20 × 0.96 = 1.152
Zen 3 5950X: 1.20 × 0.98 = 1.176

Multi-threaded:
ADL 12900K: 1.20 × 1.05 × 0.92 × 8 + 0.7 × 8 = 14.87
Zen 3 5900X: 1.20 × 0.82 × 12 = 11.81
Zen 3 5950X: 1.20 × 0.75 × 16 = 14.40

All calculated numbers are relative to a fictional Zen 2/Skylake core at 5 GHz.

By these very crude estimates, Alder Lake might be able to beat Vermeer in both single threaded and multi threaded workloads/benchmarks.
Of course, Zen 4 will be out in 2022 eventually, but that's competing with Raptor Lake, not Alder Lake.

uzzi38 · May 15, 2021

Redfire said:
Warning, there's a lot of assumptions I'm going to be making here:

Assumptions:
Zen 2 IPC = Skylake IPC
Zen 3 IPC = 1.20 Zen 2 IPC
Sunny Cove IPC = 1.20 Skylake IPC
Willow Cove IPC = 1.05 Sunny Cove IPC
Golden Cove IPC = 1.20 WIllow Cove IPC
Gracemont IPC = Skylake IPC

Golden Cove (ADL) All-core = 4.6GHz
Gracemont (ADL) All-core = 3.5 GHz
Zen 3 (5900X) All-core = 4.1 GHz
Zen 3 (5950X) All-core = 3.75 GHz

Alder Lake has a hardware scheduler/Windows scheduler is fixed an optimised for heteregeneous cores. The scheduler does well with balancing Golden Cove and Gracmont Atom cores. (Biggest Assumption right now in my opinion)

Single-threading:
ADL 12900K: 1.20 × 1.05 × 1.00 = 1.260
Zen 3 5900X: 1.20 × 0.98 = 1.152
Zen 3 5950X: 1.20 × 0.98 = 1.176

Multi-threaded:
ADL 12900K: 1.20 × 1.05 × 0.92 × 8 + 0.7 × 8 = 14.87
Zen 3 5900X: 1.20 × 0.82 × 12 = 11.81
Zen 3 5950X: 1.20 × 0.75 × 16 = 14.40

All calculated numbers are relative to a fictional Zen 2/Skylake core at 5 GHz.

By these very crude estimates, Alder Lake might be able to beat Vermeer in both single threaded and multi threaded workloads/benchmarks.
Of course, Zen 4 will be out in 2022 eventually, but that's competing with Raptor Lake, not Alder Lake.

I take it you're assuming sustained PL2 for Alder Lake and stock 142W PPT for the 5950X there and using those numbers to compare? Because you certainly won't see Alder Lake clock at 4.6GHz all core at it's 125W PL1.

Also I don't think you've factored in the Gracemont's lack of SMT into the equation here.

Also, Tiger Lake doesn't seem to show a 5% IPC advantage over Ice Lake. Intel claim the same 18% as Ice Lake, and Anandtech found a small regression in performance per clock.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Redfire · May 15, 2021

uzzi38 said:
I take it you're assuming sustained PL2 for Alder Lake and stock 142W PPT for the 5950X there and using those numbers to compare? Because you certainly won't see Alder Lake clock at 4.6GHz all core at it's 125W PL1.

Also I don't think you've factored in the Gracemont's lack of SMT into the equation here.

Also, Tiger Lake doesn't seem to show a 5% IPC advantage over Ice Lake. Intel claim the same 18% as Ice Lake, and Anandtech found a small regression in performance per clock.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

I completely forgot about the PL1/PL2 and the lack of SMT. Let me see if I can account for those.
Tiger Lake IPC is an interesting one, but I'll keep it at +5% for now.

Single-threaded:
ADL 12900K: 1.20 × 1.05 × 1.00 = 1.260
Zen 3 5900X: 1.20 × 0.96 = 1.152
Zen 3 5950X: 1.20 × 0.98 = 1.176

Multi-threaded:
ADL 12900K: 1.20 × 1.05 × 0.74 × 8 + 0.68 × 8 / 1.18 = 12.07
Zen 3 5900X: 1.20 × 0.82 × 12 = 11.81
Zen 3 5950X: 1.20 × 0.75 × 16 = 14.40

Assumptions:
Hyperthreading provides an 18% uplift
12900K's Golden Cove (PL1) All-core boost is 3.7 GHz (based on 3. 9GHz PL1 vs 4.8 GHz PL2 11900K)
12900K's Gracemont (PL1) All-core boost remains the same at 3.5 GHz

12900K still beats the 5900X, although by a much smaller extent, in multi-threading.
12900K loses to the 5950X, but that's to be expected, given the fewer cores.

Discussion Intel current and future Lakes & Rapids thread

Member

Senior member

Lifer

Golden Member

Golden Member

Lifer

Senior member

Golden Member

Lifer

Member

Diamond Member

Diamond Member

Senior member

Lifer

Golden Member

Platinum Member

Member

Elite Member

Elite Member

Platinum Member

Diamond Member

Diamond Member

Junior Member

Platinum Member

Junior Member