Discussion Intel current and future Lakes & Rapids thread

Gideon · May 15, 2021

Redfire said:
Hyperthreading provides an 18% uplift
12900K's Golden Cove (PL1) All-core boost is 3.7 GHz (based on 3. 9GHz PL1 vs 4.8 GHz PL2 11900K)
12900K's Gracemont (PL1) All-core boost remains the same at 3.5 GHz

12900K still beats the 5900X, although by a much smaller extent, in multi-threading.
12900K loses to the 5950X, but that's to be expected, given the fewer cores.

A couple of corrections

1. SMT gives at least 30% sometimes up to 40% in rendering benchmarks for both Intel and AMD. Obviously less in less parallel workloads, but these usually don't scale to >16 threads anyway
2. AMD has about 10% MT perf in store should they ignore power similarily to Intel. If they are worried about losing the ground they can play the game too (releasing some higher TDP models)
3. There are mixed messages regarding Zen 3 6nm refresh. Initially it was rumored as cancelled, then some refuted it. If it comes and is indeed on TSMC 6nm then it will not move the needle much in ST but should give ~10% of extra MT perf from less power draw ( == higher clocks for MT loads).

All in all I can't imagine AMD just sitting still for a year till late 2022 for Zen 4 when they still have some gas in the tank.

andermans · May 15, 2021

Gideon said:
1. SMT gives at least 30% sometimes up to 40% in rendering benchmarks for both Intel and AMD. Obviously less in less parallel workloads, but these usually don't scale to >16 threads anyway

I am curious what the benchmarking landscape will look like and what the Gracemont cores will be good at.

At least on mobile and server we see a significant increase in the reliance on accelerators to get the most out of the power budget and at some point I wonder if we are really going to see the effect on desktop/workstation as well. Hardware video encode/decode generally has gotten closer to the solid software encoders and the introduction of GPU hardware raytracing seems like it could be a trigger for a significant shift of rendering workloads to the GPU. (though it isn't entirely without issues, as e.g. VRAM size if often a limitation. However that seems like a big opportunity for beefy integrated GPUs. Don't think we can quite expect that with Alderlake on desktop though, needs a bit more time in that market segment).

So if we see rendering as decreasingly relevant for high core count CPUs on desktop/workstations, what is going to replace it as the dominant source of benchmarks?

Application development is mostly seeing big growth in more dynamic languages that don't need a lot of compute to compile things and developing on the cloud is getting increasingly common. So while that is the other thing besides rendering I'm familiar with that currently drives high core counts, I'm not sure this would really drive the segment.

So anyone else having good ideas on what will drive demand for strong MT performance? Am I too optimistic about this shift happening?

edit: just for illustration it seems like Cinebench R20/R23 use the Embree library for raytracing. The library is currently CPU only, but it is developed by Intel and I've already seen the first hints that Intel will release a version that supports GPUs as part of their DG2 efforts. Similarly it seems like a lot of the renderer plugins for Cinema4D (which shared the 3D engine with Cinebench) have GPU support already.

Hulk · May 15, 2021

Redfire said:
Warning, there's a lot of assumptions I'm going to be making here:

Assumptions:
Zen 2 IPC = Skylake IPC
Zen 3 IPC = 1.20 Zen 2 IPC
Sunny Cove IPC = 1.20 Skylake IPC
Willow Cove IPC = 1.05 Sunny Cove IPC
Golden Cove IPC = 1.20 WIllow Cove IPC
Gracemont IPC = Skylake IPC

Golden Cove (ADL) All-core = 4.6GHz
Gracemont (ADL) All-core = 3.5 GHz
Zen 3 (5900X) All-core = 4.1 GHz
Zen 3 (5950X) All-core = 3.75 GHz

Alder Lake has a hardware scheduler/Windows scheduler is fixed an optimised for heteregeneous cores. The scheduler does well with balancing Golden Cove and Gracmont Atom cores. (Biggest Assumption right now in my opinion)

Single-threading:
ADL 12900K: 1.20 × 1.05 × 1.00 = 1.260
Zen 3 5900X: 1.20 × 0.96 = 1.152
Zen 3 5950X: 1.20 × 0.98 = 1.176

Multi-threaded:
ADL 12900K: 1.20 × 1.05 × 0.92 × 8 + 0.7 × 8 = 14.87
Zen 3 5900X: 1.20 × 0.82 × 12 = 11.81
Zen 3 5950X: 1.20 × 0.75 × 16 = 14.40

All calculated numbers are relative to a fictional Zen 2/Skylake core at 5 GHz.

By these very crude estimates, Alder Lake might be able to beat Vermeer in both single threaded and multi threaded workloads/benchmarks.
Of course, Zen 4 will be out in 2022 eventually, but that's competing with Raptor Lake, not Alder Lake.

I like your analysis.
I'll add my own.

I'm assuming a Zen 3 core = 1.0 performance

Golden Cove will be 10% faster than Zen 3, so 1.1

Gracemont will be 70% of Zen 3, or 0.7 (clock taken into account)

So relative to 5950X scoring a "16" 12900 ADL would score 8x1.1 + 8x0.7 = 14.4

So I'm predicting faster than 5900X but slower than 5950X.

ondma · May 15, 2021

Hulk said:
I like your analysis.
I'll add my own.

I'm assuming a Zen 3 core = 1.0 performance

Golden Cove will be 10% faster than Zen 3, so 1.1

Gracemont will be 70% of Zen 3, or 0.7 (clock taken into account)

So relative to 5950X scoring a "16" 12900 ADL would score 8x1.1 + 8x0.7 = 14.4

So I'm predicting faster than 5900X but slower than 5950X.

Sounds reasonable. I would speculate GC to be a bit more than 1.1 x Zen 3, but Gracemont to be less powerful than you estimated. So I would agree, but I think it might be closer to 5900. As I said though, I still consider that a very compelling product, if the price is in the 500.00 range.

coercitiv · May 15, 2021

Hulk said:
Gracemont will be 70% of Zen 3, or 0.7 (clock taken into account)

Seems like you also forgot about the lack of SMT on Gracemont.

Hulk · May 15, 2021

coercitiv said:
Seems like you also forgot about the lack of SMT on Gracemont.

Yes you're right. Probably more like 1 Gracemont = 0.6 Zen 3 for a new total of 8+8 ADL
of 13.6.

I think the interesting point here is that it looks like 8+8 ADL will probably perform around the 5900X, perhaps a bit faster. If power consumption is really good it could be a huge win for mobile and very good for the desktop if you don't need the 5950X, assuming pricing is right. But if Zen 4 is right around the corner the game begins anew.

I know there is a wide discrepancy of opinions here about the value of the Gracemont cores for the desktop. Many people believe they will be useless to the point of it being better to disable them and others, myself included, think that with a fine-tuned scheduler they could be very helpful in moving certain apps that have an abundance of low compute threads that could be causing parallelism bottlenecks.

For example, take a 5800X and clock it at 4GHz and then take a 5950X and lock the clock at 2.0GHz, or better yet disable half the cores on the 5950X and clock at 4.0GHz. So 16 cores at 2GHz vs 8 cores at 4GHz. Theoretically about the same compute available. I guess if possible disable SMT on half of cores on the 16 core machine.

Run a bunch of multithreaded tests and see what happens. I think that would give us a lot of some additional insight.

IntelUser2000 · May 15, 2021

Hulk said:
Finally I know I am wrong on this last point but I just can't wrap my head around the fact that Intel and AMD are going to pull another 20 or 30% throughput increase out of their next gen architecture!

I know eh? We've come a long way.

Actually after the Athlon 64 introduced many people thought we were close to the limits. The latest architectures are more than 2x as fast per clock, and even Gracemont will be somewhere on the 2x level.

@andermans The prediction that it'll rely more on cloud and dedicated accelerators has been made for more than a decade now. General purpose CPUs will continue to find use.

Despite 3D engines seemingly moving more to graphics, importance of having a high end GPU matched with a high end CPU is true as ever.

Hulk said:
I know there is a wide discrepancy of opinions here about the value of the Gracemont cores for the desktop.

Lakefield can't even combine the two different cores together, so Alderlake will be the first implementation where supposedly all can work. The value in desktops is questionable compared to laptop.

Either 10 ring hops are really the limit thus the choice of 8+8, or Alderlake desktop was a rushed port from mobile versions.

I don't believe we can add Golden Cove + Gracemont without having some sort of overhead. Maybe it's done well and it's a few percent, but it'll exist. This is assuming the application supports the configuration well.

jpiniero · May 15, 2021

IntelUser2000 said:
Either 10 ring hops are really the limit thus the choice of 8+8, or Alderlake desktop was a rushed port from mobile versions.

12 but the IGP takes one (and probably the Thunderbolt on mobile)

coercitiv · May 16, 2021

Hulk said:
I think the interesting point here is that it looks like 8+8 ADL will probably perform around the 5900X, perhaps a bit faster. If power consumption is really good it could be a huge win for mobile and very good for the desktop if you don't need the 5950X, assuming pricing is right. But if Zen 4 is right around the corner the game begins anew.

I don't think it's interesting at all, the die area allocated for 8+8 is equivalent to 10 GC cores. With a 20% IPC advantage and similar clocks the 10 GC chip would obviously match 5900X. The 8+8 not matching 5900X would be a big problem for the first gen desktop hybrid.

Maybe it's time people start comparing 8+8 versus 10+0 of the same kind of cores, and see where exactly you think there's gains to be had for hybrids, because so far all I see is power consumption, which is pure irony considering Intel's attitude in the last few years (squeeze performance no matter the power cost).

B-Riz · May 16, 2021

Hulk said:
Finally I know I am wrong on this last point but I just can't wrap my head around the fact that Intel and AMD are going to pull another 20 or 30% throughput increase out of their next gen architecture! I can see maybe another 5% out of DDR5, and maybe another 5% by going wider, smarter/larger internal structures, but they have been tweaking these parts forever. I just don't see what's left on the table to continue to extract more instruction level parallelism out of these designs. I mean how much out-of-order processing can be done before you hit the proverbial wall?

Remember back when Core 2 Duo desktop came out, and blew AMD out of the water and Intel dominated really until Zen2 dropped?

That feeling is happening every year now that AMD has an engineer as a CEO and a laser focus on x86 CPU performance.

It is kinda sad that the structure that allowed Intel to develop two internally competing CPU's, and pick the best one to release (how we got Core / Core 2) just, stopped being a thing.

And now, this thread is full of hope on Alder Lake being competitive.

I think we will be impressed and disappointed by it, and, it will still take a few years for Intel to get back to having a Core 2 Duo like release.

Shivansps · May 16, 2021

I dont think Alder Lake was ever designed to go against AMD Zen cpus, i think it is designed to put a full stop to ARM attempts to get into notebooks. As a bonus it puts AMD in a weird position, if AMD were to do the same, from were they are going to get a little core? Intel has expended several years improving their little core for this.

SAAA · May 16, 2021

coercitiv said:
I don't think it's interesting at all, the die area allocated for 8+8 is equivalent to 10 GC cores. With a 20% IPC advantage and similar clocks the 10 GC chip would obviously match 5900X. The 8+8 not matching 5900X would be a big problem for the first gen desktop hybrid.

Maybe it's time people start comparing 8+8 versus 10+0 of the same kind of cores, and see where exactly you think there's gains to be had for hybrids, because so far all I see is power consumption, which is pure irony considering Intel's attitude in the last few years (squeeze performance no matter the power cost).

16, but even 10 big cores start to get moot: we all know Amdahl's law and diminishing returns. Unless all software can magically be made parallel having a few big cores paired with 8, 16 or even 32 small(actually medium) cores is the best plan going forward, from an performance/area point of view other than efficiency.

Why 8 big + 32 small cores? Using 4 times the area of the small ones ought to speed up single thread somewhat considerably… looking at Golden Cove vs Gracemont the amount is about 40-50% faster per thread, plus HT for another 20-30%, that's 1.45x the single and 0.45x multi thread performance total in a similar area.

Maybe an even better solution would be to have unified front end between them, like 4 small and a big one merged. Something in between SMT and CMT so to speak, but with asymmetrical cores, not to save power but only to get maximum performance of course.
It would be hellish to make current software work with it, but much better going forward into 2025-2030 as I don't see 64 big cores going mainstream as a good idea... also at that point they aren't really "big" cores anymore.

eek2121 · May 16, 2021

SAAA said:
16, but even 10 big cores start to get moot: we all know Amdahl's law and diminishing returns. Unless all software can magically be made parallel having a few big cores paired with 8, 16 or even 32 small(actually medium) cores is the best plan going forward, from an performance/area point of view other than efficiency.

Why 8 big + 32 small cores? Using 4 times the area of the small ones ought to speed up single thread somewhat considerably… looking at Golden Cove vs Gracemont the amount is about 40-50% faster per thread, plus HT for another 20-30%, that's 1.45x the single and 0.45x multi thread performance total in a similar area.

Maybe an even better solution would be to have unified front end between them, like 4 small and a big one merged. Something in between SMT and CMT so to speak, but with asymmetrical cores, not to save power but only to get maximum performance of course.
It would be hellish to make current software work with it, but much better going forward into 2025-2030 as I don't see 64 big cores going mainstream as a good idea... also at that point they aren't really "big" cores anymore.

My desktop has over 2000 active threads and hundreds of active processes at any given point. Sure, many of them run in the background, but for many workloads, we will always benefit from more cores.

Alder Lake has quite a few unknowns that go beyond the 20% “big core” improvement. Those tiny little gracemont cores with no hyperthreading are going to sip power, so ask yourself why ADL-S currently has a PL1 of 125W and a PL2 of 228W (without AVX-512 no less). I am willing to bet it isn’t because Golden Cove is power hungry (well, it probably will be, but not like rocket lake). We still don’t know how high the final silicon will even boost. We know 10SF can hit higher clocks at a given voltage, and can also hit 5ghz, and 10ESF will offer further improvements beyond that. PCIE5 uncore was mentioned, but I believe DDR5 savings will offset that. Could we see Intel hit multicore 5.3 and a single core 5.4 or 5.5? I don’t know, but anything is possible.

DrMrLordX · May 16, 2021

coercitiv said:
The 8+8 not matching 5900X would be a big problem for the first gen desktop hybrid.

Putting all that effort into matching a CPU released a year prior would be the big problem for Intel. That aside, what exactly are they going to do after Raptor Lake if their projected 7nm wafer output will be 20 kwpm by 2023?

jpiniero · May 16, 2021

DrMrLordX said:
Putting all that effort into matching a CPU released a year prior would be the big problem for Intel. That aside, what exactly are they going to do after Raptor Lake if their projected 7nm wafer output will be 20 kwpm by 2023?

Tiny chiplets and dual sourcing those chiplets at TSMC.

coercitiv · May 16, 2021

SAAA said:
16, but even 10 big cores start to get moot: we all know Amdahl's law and diminishing returns. Unless all software can magically be made parallel having a few big cores paired with 8, 16 or even 32 small(actually medium) cores is the best plan going forward, from an performance/area point of view other than efficiency.

The solution to Amdahl's law diminishing returns on 10-16 big cores is going for even more small cores instead?!

eek2121 said:
My desktop has over 2000 active threads and hundreds of active processes at any given point. Sure, many of them run in the background, but for many workloads, we will always benefit from more cores.

Those 2000+ "active" threads use less than 2% of a modern CPU, in fact over 95% of the time the cores are at sleep.

DrMrLordX · May 16, 2021

jpiniero said:
Tiny chiplets and dual sourcing those chiplets at TSMC.

I don't imagine that TSMC will ever allow Intel enough volume on sufficiently-advanced nodes to threaten any of TSMC's most loyal customers.

SAAA · May 16, 2021

coercitiv said:
The solution to Amdahl's law diminishing returns on 10-16 big cores is going for even more small cores instead?!

Yes, but more of the smaller cores so the parallel workloads get all the speedup while big cores can keep getting larger and faster at single-threaded loads.

coercitiv said:
Those 2000+ "active" threads use less than 2% of a modern CPU, in fact over 95% of the time the cores are at sleep.

Indeed, the point is not only to make them do minimal jobs but keep up with parallel loads that still aren't fit for a GPU.

eek2121 · May 16, 2021

coercitiv said:
The solution to Amdahl's law diminishing returns on 10-16 big cores is going for even more small cores instead?!

Those 2000+ "active" threads use less than 2% of a modern CPU, in fact over 95% of the time the cores are at sleep.

Indeed I mentioned as much. However, 3D rendering, video editing/encoding, compiling large software projects, etc. all benefit from more cores. There are other workloads that also come into play with high core counts.

RTX · May 16, 2021

eek2121 said:
My desktop has over 2000 active threads and hundreds of active processes at any given point. Sure, many of them run in the background, but for many workloads, we will always benefit from more cores.

Alder Lake has quite a few unknowns that go beyond the 20% “big core” improvement. Those tiny little gracemont cores with no hyperthreading are going to sip power, so ask yourself why ADL-S currently has a PL1 of 125W and a PL2 of 228W (without AVX-512 no less). I am willing to bet it isn’t because Golden Cove is power hungry (well, it probably will be, but not like rocket lake). We still don’t know how high the final silicon will even boost. We know 10SF can hit higher clocks at a given voltage, and can also hit 5ghz, and 10ESF will offer further improvements beyond that. PCIE5 uncore was mentioned, but I believe DDR5 savings will offset that. Could we see Intel hit multicore 5.3 and a single core 5.4 or 5.5? I don’t know, but anything is possible.

11900K's TBM 3.0 is only 5.2 ( 5.3 only if under 70C ). It's in a desktop platform with significantly higher thermal headroom. No reason Tigerlake could not hit 5.1 if given the same thermal headroom. The 11900K in a X170KM-G laptop won't boost to 5.2. 10900K maxes out around 4.9-5.0 in a laptop.

naukkis · May 16, 2021

DrMrLordX said:
I don't imagine that TSMC will ever allow Intel enough volume on sufficiently-advanced nodes to threaten any of TSMC's most loyal customers.

Intel is already one of the biggest TSMC customers. And didn't new CEO described their future earlier, there will be leadership products for both client and server environments from TSMC 2023? Intel doesn't have choice as their own abilities to produce are somehow limited.

moinmoin · May 16, 2021

naukkis said:
Intel is already one of the biggest TSMC customers.

Not on TSMC's leading edge nodes though which is what Intel will need.

naukkis · May 16, 2021

moinmoin said:
Not on TSMC's leading edge nodes though which is what Intel will need.

Not yet

Highlights of the day: Intel to outsource 3nm chip production to TSMC

Just days after Intel seemingly dismissed speculation about expanding outsourcing to TSMC, sources from Taiwan's supply chain have disclosed that the US chip vendor has plans to have the pure-play foundry fabricate its core CPUs at node in 2022. In the display sector, are set to rise again in...

www.digitimes.com

Digitimes is usually spot on what happens in Taiwan manufacturing.

moinmoin · May 16, 2021

naukkis said:
Not yet

Highlights of the day: Intel to outsource 3nm chip production to TSMC

Just days after Intel seemingly dismissed speculation about expanding outsourcing to TSMC, sources from Taiwan's supply chain have disclosed that the US chip vendor has plans to have the pure-play foundry fabricate its core CPUs at node in 2022. In the display sector, are set to rise again in...

www.digitimes.com

Digitimes is usually spot on what happens in Taiwan manufacturing.

Will be interesting to see whether Intel will be a first mover on N3 alongside Apple or join somewhat later.

uzzi38 · May 16, 2021

naukkis said:
Not yet

Highlights of the day: Intel to outsource 3nm chip production to TSMC

Just days after Intel seemingly dismissed speculation about expanding outsourcing to TSMC, sources from Taiwan's supply chain have disclosed that the US chip vendor has plans to have the pure-play foundry fabricate its core CPUs at node in 2022. In the display sector, are set to rise again in...

www.digitimes.com

Digitimes is usually spot on what happens in Taiwan manufacturing.

Yes, I'm sure they are. Man, gotta love those 5nm Zen 3s!

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Lifer

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Platinum Member