Discussion Intel current and future Lakes & Rapids thread

eek2121 · Sep 5, 2021

JasonLD said:
Then they would be able to do 8+8 on 90W package then. I still think focusing ST performance on 8 Big Cores (May increase to 10 if needed) and extending MT performance on small cores is better long term solution for none-HEDT desktops.

JasonLD said:
If OS scheduler works properly, I don't see the downside of using many small cores for MT tasks/background task while maintaining ST performance on Big cores for most tasks. Even for desktop/workstation, I would think 8+32 will perform better than something like 24 big cores and the chip is going to be smaller and use less power.

Don’t forget Apple does this and it works well.

I am not opposed to hybrid designs as long as the performance SKUs aren’t artificially segmented behind HEDT or workstation parts.

I can’t wait to see what the future brings from both Intel and AMD, especially since AMD figured out how to solve the issue of “small” cores not supporting the same instructions as “big” cores.

Mopetar · Sep 5, 2021

DrMrLordX said:
The 5950X isn't an HEDT CPU.

That's splitting hairs. It's not part of the HEDT platform, but AMD sells Threadripper CPUs that have both 12 and 16 cores. Are those not HEDT CPUs either?

It's an option for people who might need additional lifting power from more cores but don't want to invest in an HEDT system. Most desktop users won't get much added benefit of using a 5950X over a 5800X.

RTX2080 · Sep 5, 2021

JoeRambo said:
"More R's to choose from" ------------- could reference the memory dividers 100 / 133, there might be more and they might actually play role in stability and performance?

It's a riddle and I don't know details of RKL memory option so all are guestimation.....some guys guess 'more R's' means more CPU multiplier option(Ratio) can be chosen on K model(0.25x or 0.5x Ratio?)

BTW I mistyped BCLK as FCLK 😛 But I think BCLK could already being OCed on some older Intel platform, IIRC?

Ajay · Sep 5, 2021

Mopetar said:
That's splitting hairs. It's not part of the HEDT platform, but AMD sells Threadripper CPUs that have both 12 and 16 cores. Are those not HEDT CPUs either?

It's an option for people who might need additional lifting power from more cores but don't want to invest in an HEDT system. Most desktop users won't get much added benefit of using a 5950X over a 5800X.

I think we've covered this b/4, but HEDT has pretty much evolved into having more memory channels and I/O than desktop, not necessarily more cores (which was the 'old' standard).

JasonLD · Sep 5, 2021

DrMrLordX said:
Highly doubtful. Intel's competition can sell 16c with an effective cap on power of 142W without making any compromises on SIMD extensions. Their server CPUs can scale up to 64c and stay under 300W. The only reason why Intel is even thinking about heterogeneous core situations is that they can't fit 16c Golden Cove in one consumer package without making huge clockspeed sacrifices in the process. We're seeing an inversion where Intel is throwing in the towel on Core and embracing Atom out of necessity. The transition will be awkward, but eventually, you may just see Atom.

AMD can do better if they have heterogenous core setup right now. Current situation where Intel is late on node transition and having huge big cores might have forced to jump on heterogeneous core CPUs early, but I believe this is where all the future CPUs are heading. AMD probably will follow suit in maybe 3-4 years time.

Zucker2k · Sep 5, 2021

Mopetar said:
That's splitting hairs. It's not part of the HEDT platform, but AMD sells Threadripper CPUs that have both 12 and 16 cores. Are those not HEDT CPUs either?

It's an option for people who might need additional lifting power from more cores but don't want to invest in an HEDT system. Most desktop users won't get much added benefit of using a 5950X over a 5800X.

Don't forget the 5950x's immediate successors, the 1950x and the 2950x. Yes, the 5950x is limited to dual-channel memory support and gimped (desktop level io) but it's release was effectively a doubling of cores twice in three years on the desktop. Not a bad thing, mind you.

moinmoin · Sep 5, 2021

JasonLD said:
AMD can do better if they have heterogenous core setup right now. Current situation where Intel is late on node transition and having huge big cores might have forced to jump on heterogeneous core CPUs early, but I believe this is where all the future CPUs are heading. AMD probably will follow suit in maybe 3-4 years time.

Going by rumors and patents it seems AMD will approach this by mixing Zen gens as well as introducing some form of minimum core that can handle the most basic of codes and redirects everything else to the normal cores.

DrMrLordX · Sep 5, 2021

Mopetar said:
That's splitting hairs. It's not part of the HEDT platform, but AMD sells Threadripper CPUs that have both 12 and 16 cores. Are those not HEDT CPUs either?

@Ajay has it right. HEDT is more about the platform than the # of cores.

JasonLD said:
AMD can do better if they have heterogenous core setup right now. Current situation where Intel is late on node transition and having huge big cores might have forced to jump on heterogeneous core CPUs early, but I believe this is where all the future CPUs are heading. AMD probably will follow suit in maybe 3-4 years time.

. . . maybe? Right now I own a 12c CPU which is more than "what I need". What it really means is that I can run two fairly-intensive applications at the same time and not need to worry about running out of CPU resources (my 16GB of RAM would be more of a bottleneck). It also makes streaming a lot easier. I'm probably going to stick to 12c or 16c CPUs from here on out since I've gotten used to the extra computational resources. If anyone offered me 8c + 32c in the future, I would turn my nose up at this option. Why do I need 32 "little" cores? I can not with confidence move a latency/performance-sensitive task to those cores, nor do I want my CPU to have to constantly bounce threads between the "big" and "small" core complexes as I switch focus on applications. It's so much easier for me to rely on my CPU to handle every workload I throw at it with equal performance, even when I load it up with applications that I would only have run individually in the past on my older systems.

Adding 32 "little" cores might help me win some MT benchmarks over a future 12c system, but in terms of real-world use, those underpowered things would really do me no good. I don't have enough background OS tasks available to justify the use of silicon.

Hulk · Sep 5, 2021

When a new CPU architecture release is on the horizon it generates a lot of discussion on these boards. But the added unknowns of the Alder Lake heterogeneous design is taking the normal discussion up a few notches.

Some of the opinions I've noted here from most pessimistic to most optimistic.

I can't wait for testing of the actual parts to put some of this debate to rest. Of course we'll continue the debate but some questions will be answered at least.

1. The little cores are a complete waste of resources and the context switching is likely going to decrease performance in many situations so I'm going to disable them if possible.

2. The little cores might be kind of helpful now and then but Intel just put them in there to compete on a per-core basis with AMD. I could do without them and would rather just have more big cores.

3. The little cores will off load many background tasks from the big cores, allowing them to work at their full potential on latency critical threads. They will greatly enhance performance and efficiency.

4. The little cores aren't actually so little if they are comparable to Skylake and Alder Lake is going to be a great performer. In fact I'm looking forward to 8+16 or 8+24 versions.

LightningZ71 · Sep 5, 2021

While 8+8 will certainly be interesting, I think that 8+16 with HT turned off and the cove cores given every drop of power they can take will be something interesting. It's already the case that HT second threads get a fraction of the performance of the single thread capabilities of each core. Why bother with it at all if you have 16 Mint cores to assign threads to?

DrMrLordX · Sep 6, 2021

LightningZ71 said:
Why bother with it at all if you have 16 Mint cores to assign threads to?

Intercore latency will potentially be much higher if you have to bounce threads between p and e cores. If you can keep everything on an HT-generated logical core or physical p core then you don't eat that penalty. You're also losing maybe 20% of the p cores' execution resources by turning off HT unless it's a well-crafted AVX2 workload.

Zucker2k · Sep 6, 2021

DrMrLordX said:
Intercore latency will potentially be much higher if you have to bounce threads between p and e cores. If you can keep everything on an HT-generated logical core or physical p core then you don't eat that penalty. You're also losing maybe 20% of the p cores' execution resources by turning off HT unless it's a well-crafted AVX2 workload.

What 20% resources? It's about efficient use of power. With those many "little" cores, you're better off spending that power on real cores than virtual cores. "Real men go for real cores." Hehe.

DrMrLordX · Sep 6, 2021

Zucker2k said:
What 20% resources?

Intel's implementation of HT is usually good for a 20% increase in MT performance in "real world" applications; e.g. not Linpack or similar.

It's about efficient use of power. With those many "little" cores, you're better off spending that power on real cores than virtual cores. "Real men go for real cores." Hehe.

That remains to be seen. If you're going to engage a Golden Cove core at all, it's probably best to exhaust its pipeline and maybe lower its clockspeed to an optimal point in the v/f curve rather than leave some pipeline stages idle and move a thread to a different core cluster with Gracemont cores. If not, then Golden Cove must be a true albatross of a core.

Zucker2k · Sep 6, 2021

DrMrLordX said:
Intel's implementation of HT is usually good for a 20% increase in MT performance in "real world" applications; e.g. not Linpack or similar.

That remains to be seen. If you're going to engage a Golden Cove core at all, it's probably best to exhaust its pipeline and maybe lower its clockspeed to an optimal point in the v/f curve rather than leave some pipeline stages idle and move a thread to a different core cluster with Gracemont cores. If not, then Golden Cove must be a true albatross of a core.

Golden cove has HT. Golden cove also operates in the 5GHz range, along with it's synced virtual cores. So, I don't know why Intel should be sacrificing speed in a search of a more palatable v/f curve because of 20%? It'll be better to save that energy and put it on a real core for more than double the performance.

DrMrLordX · Sep 6, 2021

Zucker2k said:
Golden cove has HT. Golden cove also operates in the 5GHz range, along with it's synced virtual cores. So, I don't know why Intel should be sacrificing speed in a search of a more palatable v/f curve because of 20%?

It is not going to be "sacrificing speed" if all 8 Golden Cove cores are fully loaded. They will not run 5 GHz simultaneously @ 125W. Not even at 225W or 241W. The boost algo will already lower clockspeed when that many cores are engaged.

Zucker2k said:
It'll be better to save that energy and put it on a real core for more than double the performance.

How will engaging an "e" core more than double the performance? Is Golden Cove really that bad?

Zucker2k · Sep 6, 2021

DrMrLordX said:
How will engaging an "e" core more than double the performance? Is Golden Cove really that bad?

I'm talking about the energy saved from sacrificing the HT-20% and investing it in the e cores.

NTMBK · Sep 6, 2021

DrMrLordX said:
You would think so, but that's not always the case. There are, for example, some software encoders that won't scale up to 16 cores. They usually crap out at around 8-10 cores.

And then there are games, and those often don't even scale past 8 cores. Unless you're streaming . . . no idea how well adding Gracemont cores will fit that workload.

It's really down to which applications you intend to run, and how many applications you intend to run simultaneously.

I mean it's not even a comparison between 10 and 16 cores, it's between 20 threads and 24 threads. That's a tiny difference in thread count.

DrMrLordX · Sep 6, 2021

Zucker2k said:
I'm talking about the energy saved from sacrificing the HT-20% and investing it in the e cores.

I was aware of that. The question still stands. Committing a thread to a Gracemont core as opposed to the logical core of Golden Cove might result in better performance if there is no inter-core traffic between core complexes, but I do not think that there is much power saving to be had there (in terms of perf/watt).

NTMBK said:
I mean it's not even a comparison between 10 and 16 cores, it's between 20 threads and 24 threads. That's a tiny difference in thread count.

That's a bit misleading. Hyperthreading only adds a little performance for a little power draw. When dealing with a workload that will only scale to some indeterminate thread count between 10-20, overall performance gains moving from 8c to 10c Golden Cove should be better than adding 16c Gracemont. It is 100% guaranteed that you will get significant performance scaling from adding two large Golden Cove cores at high clockspeeds in that scenario, while adding 16c Gracemont presents the very real possibility that many of those cores will be underutilized.

We won't really know how "real world" workloads will react to 8+16c Alder Lake until it arrives, but based on the way we've seen software behave in benchmarks in the past, there may be a significant number of applications that simply do not respond well to the addition of Gracemont (scheduler issues aside).

dacostafilipe · Sep 6, 2021

DrMrLordX said:
We won't really know how "real world" workloads will react to 8+16c Alder Lake until it arrives, but based on the way we've seen software behave in benchmarks in the past, there may be a significant number of applications that simply do not respond well to the addition of Gracemont (scheduler issues aside).

But what about games? Did anyone see a review around gaming on "non-unifrom-performance" cores? Maybe by down-clocking some of the cores?

Games today seem to spread the charge evenly across all the cores and this could lead to some issues. Even when prioritising the p-cores I worry about frame pacing :/

DrMrLordX · Sep 6, 2021

NeoLuxembourg said:
But what about games? Did anyone see a review around gaming on "non-unifrom-performance" cores? Maybe by down-clocking some of the cores?

Games today seem to spread the charge evenly across all the cores and this could lead to some issues. Even when prioritising the p-cores I worry about frame pacing :/

Hopefully the scheduler will prevent that from becoming a major problem. Expect Intel to send liaisons to major engine developers (Unreal Engine, idTech, Frostbite, etc.) to coordinate with them.

LightningZ71 · Sep 6, 2021

My point is fairly simple: a Mont core should always be faster than 20% of a Cove core. If we have a bunch of light duty tasks, they can always be better served on the E cores. If we have a perfectly scaling high thread count task, having 100% of an E core is still faster than 20% of a P core. If the task is so hanstrung by inter-core communication, then the ring bus is likely your biggest headache there anyway. About the only case where any number of threads is better on a P core is when you have precisely 2 (for smt2) threads that share resources that can fit in the L2 AND have instructions that can be issued simultaneously without blocking for back end resources constantly.

I propose that Intel would be better served, on a FUTURE node, to develope a cove core that sacrifices no resources for HT and instead, optimized for single thread performance. Instead, just offer a few more Mont clusters. The number of cases where this hurts would be much less than the cases where it helps.

IntelUser2000 · Sep 6, 2021

Gentlemen,

You know it's not as simple as that. There are many parameters involved.

When the first Hyperthreading CPUs came out in the form of Xeon, the performance was uneven and there was a loss. It was pretty bad. Then when they introduced it on the consumer parts, it became much better.

But it was with Nehalem they fixed nearly all the regressions and became a "free" performance feature.

Details, details, details. We don't know them. It's not just Single thread vs. Embarassingly Parallel. There are hundreds of in betweens. It's not Hyperthreading benefitting MT versus Gracemont. Hyperthreading is a potential zero gain in some MT cases because the architecture itself can be fully utilized and HT just causes contention. In those cases, Gracemont will be an adder. Of course adding Gracemont means there will be overhead due to asymmetrical cores. Potentially they are saying a cluster of 4 Gracemonts are better then two Golden Cove cores, so the maximum performance gain is far above what Hyperthreading can offer, since it rarely exceeds 30%.

By the way, overhead exists for adding extra cores. That's why scaling is not ideal. The Xeons, the EPYCs, the POWERs add whole bunch of extra circuitry to just transfer data between the cores to basically minimize the overhead.

We're living in an increasingly complicated world. Look at security. There's a never ending war between people who hack and those that secure them. That's why we went from the simplest passwords, to basically hacking a CPU.

Single cores -- Multiple cores -- CPU + Accelerators -- Asymmetrical cores + Accelerators

Sometimes an idea dies not because it's a wrong one but because it arrived before it's time. We'll see where the hybrid approach falls.

This is what the future may hold:

Presentation by then Intel CTO Justin Rattner in 2003 Intel IDF.

The ideal for a hybrid CPU is to use few ridiculously large cores for maximum ST and low thread count performance and many efficient and tiny cores for maximum multi-threaded performance.

We might see the CPUs split like in server where some "High speed" SKUs exist to maximize low-thread count performance, but others exist with many -mont cores alongside them.

RTX2080 · Sep 6, 2021

This benchmark suite Ludashi is infamous but still it showed a linear uplift on both side(ST 5950x>5900x>5800x>5600x, Intel i9>i7>i5>i3, and heavily rely on TDP limit). Maybe this bench make some sense but it all depends on how you look it. 12900k ST(88535) MT(891683)

Still doubt those bench suites lack big+little design optimization though.....

EDIT: 12900k being recognized wrongly as 12C24T cpu, in my point of view MT score doen't make sense

EDIT2: the odd 'overall' score like 859709 for 12900k is not an MT score, I guess this score is just a mix sum-up overall of ST, MT, gaming, memory, etc, just neglect this odd score ranking

https://twitter.com/x/status/1434491440390836231

Abwx · Sep 6, 2021

cortexa99 said:
This benchmark suit Ludashi is infamous but still it showed a linear uplift on both side(ST 5950x>5900x>5800x>5600x, Intel i9>i7>i5>i3, and heavily rely on TDP limit). Maybe this bench make some sense but it all depends on how you look it. 12900k ST(88535) MT(891683)

Still doubt those bench suits lack big+little design optimization though.....

https://twitter.com/x/status/1434491440390836231

It doesnt need optimisation since it seems to load 24 threads at least, and eventually make use of AVX512.

You can compare with computerbase scores, there s the 10980XE as well as the 9980/7980XE and 7960X.

Intel Core i9-11900K & i5-11600K „Rocket Lake-S“ im Test: Benchmarks in Anwendungen

Core i9-11900K und i5-11600K im Test: Benchmarks in Anwendungen / Multi-Core-Szenarien / Single-Core-Szenarien

www.computerbase.de

So half relevant on the whole, and since RKL, among others, is ahead of the 5800X it s quite favourable to Intel.

Cardyak · Sep 6, 2021

LightningZ71 said:
I propose that Intel would be better served, on a FUTURE node, to develope a cove core that sacrifices no resources for HT and instead, optimized for single thread performance. Instead, just offer a few more Mont clusters. The number of cases where this hurts would be much less than the cases where it helps.

I said as much earlier on in this thread, I fully expect a complete reset on the Cove designs, starting from scratch with a clean slate and a new design largely inherited from the Atom team.

I’d estimate that not only will HT eventually be dropped, but also other ideas from Tremont/Gracemont will be bought over, such as:

- Less execution units per port and instead adding more ports to reduce backend contention
- Decode units organised into multiple clusters
- Pre-decode length caching

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Diamond Member

Senior member

Lifer

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Lifer

Senior member

Lifer

Platinum Member

Elite Member

Senior member

Lifer

Member