• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 455 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Timmah!

Senior member
Jul 24, 2010
761
65
91
I have a question, these big cores, how exactly are they better than smaller one (except obvious thing like having HT)? Are they better because they have more logic making them more universal for various compute situations, or are they generally faster at pretty much everything?
Say a highly parallel task like Vray rendering or some video encoding - could be the smaller core as fast as the big core (without HT) in such task or no chance?
 

dullard

Elite Member
May 21, 2001
22,972
1,228
126
I have a question, these big cores, how exactly are they better than smaller one (except obvious thing like having HT)? Are they better because they have more logic making them more universal for various compute situations, or are they generally faster at pretty much everything?
Say a highly parallel task like Vray rendering or some video encoding - could be the smaller core as fast as the big core (without HT) in such task or no chance?
Two main differences:

1) They are optimized for different performance/power levels. Take a look at this image. If you tried to operate at a very low power (10%) then the bigger cores just can't operate. If you go up a bit to say 15% power, then the bigger cores give ~33% performance and the smaller cores give ~38% performance. That is a fairly substantial performance gain to use the smaller cores instead of the bigger cores at low power levels. Continue bumping up the power to ~27% and both cores give you the exact same performance. Keep cranking up the power to 50% and the small cores lag way behind in performance (~64% compared to ~77%). Even more power and the smaller cores just can't handle it, you need the bigger cores.

As we shift to more and more cores onto the same chip, then each core will have less and less power available (assuming that you keep the total power levels similar). So, as core counts go up, we will move further and further to the bottom left on that graph. Meaning at high core counts, there isn't as much need for the bigger cores.

2) They have different instruction sets. In non-hybrid chips the bigger cores can perform certain niche tasks with far, far better performance levels than the small cores due to specialized instructions. This is the debatable part about Alder Lake though. Can Intel and Microsoft figure out how to to actually use the instructions that are only in bigger cores in a hybrid situation?
 
Last edited:

Timmah!

Senior member
Jul 24, 2010
761
65
91
Two main differences:

1) They are optimized for different performance/power levels. Take a look at this image. If you tried to operate at a very low power (10%) then the bigger cores just can't operate. If you go up a bit to say 15% power, then the bigger cores give ~33% performance and the smaller cores give ~38% performance. That is a fairly substantial performance gain to use the smaller cores instead of the bigger cores at low power levels. Continue bumping up the power to ~27% and both cores give you the exact same performance. Keep cranking up the power to 50% and the small cores lag way behind in performance (~64% compared to ~77%). Even more power and the smaller cores just can't handle it, you need the bigger cores.

2) They have different instruction sets. In non-hybrid chips the bigger cores can perform certain niche tasks with far, far better performance levels than the small cores due to specialized instructions. This is the debatable part about Alder Lake though. Can Intel and Microsoft figure out how to to actually use the instructions that are only in bigger cores in a hybrid situation?
Thanks!

Does that different performance/power level mean that Vray rendering might in fact use only the big cores, but not the small ones?
And regarding instruction sets, if GRT is pretty much Skylake, should it not have every instruction needed for something like Vray already? I dont think Vray uses AVX512, nor does it need any of those specialized new AI-related instructions...just regular stuff old Skylake cores had.
 

Bouowmx

Golden Member
Nov 13, 2016
1,115
510
146
I have a question, these big cores, how exactly are they better than smaller one (except obvious thing like having HT)? Are they better because they have more logic making them more universal for various compute situations, or are they generally faster at pretty much everything?
Say a highly parallel task like Vray rendering or some video encoding - could be the smaller core as fast as the big core (without HT) in such task or no chance?
Big core is big because it has more area dedicated to finding instruction-level parallelism, i.e. be faster. Wider superscalar, deeper out-of-order execution.
 
  • Like
Reactions: Tlh97 and 2blzd

Timmah!

Senior member
Jul 24, 2010
761
65
91
Big core is big because it has more area dedicated to finding instruction-level parallelism, i.e. be faster. Wider superscalar, deeper out-of-order execution.
Is this additional area what makes these Cove architectures advanced compared to Skylake arch? Does AMD improve their Zen cores the same way between its generations? Is there any other way to make computing faster, thats not about making things wider and bigger and whatnot? I guess something like quantum computing, except not as outlandish?
 

SAAA

Senior member
May 14, 2014
517
113
116
We discussed this earlier... Marketing likes the idea of having more cores and it would do better in MT benchmarks fully loaded. Adding another 2 slots is going to make the ring pretty long.
Maybe they'll double the amount of small cores per cluster instead. Or have two rings one for small and another for big cores connected to each other… that would be better for anything working on big cores only reducing latency from Alder, once more MT grunt is needed 16 extra little cores easily compensate a bit higher latency.
 

Bouowmx

Golden Member
Nov 13, 2016
1,115
510
146
Is this additional area what makes these Cove architectures advanced compared to Skylake arch? Does AMD improve their Zen cores the same way between its generations? Is there any other way to make computing faster, thats not about making things wider and bigger and whatnot? I guess something like quantum computing, except not as outlandish?
Yes and yes. Better ILP is how any manufacturer makes a CPU core faster, combined with clock frequency.
Another way is parallel (multi-core) processing. Taken to the extreme, instead of a few complex cores, have many simpler cores sacrificing single-thread performance (like a GPU). Finally, meet in the middle with a heterogenous architecture: combine the single-thread perf of a complex core with the multi-thread throughput of many simple cores.
 
  • Like
Reactions: Tlh97

coercitiv

Diamond Member
Jan 24, 2014
4,518
6,141
136
Or have two rings one for small and another for big cores connected to each other… that would be better for anything working on big cores only reducing latency from Alder, once more MT grunt is needed 16 extra little cores easily compensate a bit higher latency.
That sounds good for a professional machine, yet this is consumer desktop & mobile we're talking about. Both the double ring and the 24c/32t ideas make little sense for consumers, unless HEDT is going the way of the dodo.

Food for thought: how does one reconcile the rumored "improved cache for gaming" (as per the leaked Intel slide) with the increased reliance on small cores (as per the MLID leak claim)? To me it seems these require diametrically opposed design decisions, your idea of double ring bus shows just how much a design can change depending on workload priorities.
 

DrMrLordX

Lifer
Apr 27, 2000
17,664
6,657
136
Intel has publicly stated that they are going towards a mix/match to meet your needs long-term vision. One rumor of one combination does not imply that it is the only combination of cores that will be produced.
1). It's the only "leak" on Raptor Lake I've seen to date
2). It's unclear how a consumer chip like Raptor Lake would truly benefit from doubling the Gracemont core count

If Golden Cove is such a bad/dead end core that Gracemont is starting to look attractive in comparison, then Intel may be in more trouble than we know.

We discussed this earlier... Marketing likes the idea of having more cores and it would do better in MT benchmarks fully loaded. Adding another 2 slots is going to make the ring pretty long.
10c did work for Comet Lake. Mostly. There were some weird issues with the 8c Comet Lake derived from 10c dice, but nothing major that really affected end-user performance (that I can recall) vs. 8c Coffee Lake.
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
17,664
6,657
136
@jpiniero

I see what you're trying to say, but Alder Lake isn't going to be a workstation or HEDT chip. Unless Intel is tacitly admitting that they won't have a workstation/HEDT chip. Otherwise, that kind of marketing just isn't going to work for them. I honestly don't think the 24c/32t approach will beat whatever happens to be their competitor's top desktop-socket multcore CPU, much less their HEDT offerings, in raw MT performance. For workloads that struggle to utilize 16c chips like the 5950X, adding more Gracemont cores will make 0 difference but will chew up die area anyway.

Intel needs to focus on the 6c-12c range for Golden Cove, assuming 10SFE or ESF or whatever it's really called permits them to do so in whatever timeframe Raptor Lake actually launches. Which I guess would be Q4 2022. They're never going to need more than 8c Gracemont.
 

jpiniero

Diamond Member
Oct 1, 2010
9,939
2,279
136
You also have to factor in that the Alder Lake die is already going to be gigantic. If they do go to 8 core Small Clusters that might be a cheap way in area to improve the MT performance.
 

dullard

Elite Member
May 21, 2001
22,972
1,228
126
2). It's unclear how a consumer chip like Raptor Lake would truly benefit from doubling the Gracemont core count

If Golden Cove is such a bad/dead end core that Gracemont is starting to look attractive in comparison, then Intel may be in more trouble than we know.
Lets go back to the Lakefield performance graph and think about it.
Note 1: This was preliminary data to begin with
Note 2: I'm ignoring process improvements with Gracemont and Golden Cove.
Note 3: I'm assuming constant power limits.
Making a major extrapolation (with process change and core count changes) and assuming that there are no surprises, then you get the hypothetical graph below. Please take it with a grain of salt, it is for discussion purposes only.

Suppose you took the same amount of power for two chips (A and B), but used double the number of cores on chip B. Suddenly each core on chip B gets half the power as the cores on chip A. Cores on chip B operate at a different place on the performance/power curve than cores on chip A. In the case of Lakefield, going to half the power still meant that Sunny Cove was the higher performing core than Tremont. But, what if you made chip C with 4 times the cores of chip A? Suddenly dividing the power up 4 ways means that the Tremont cores now begin to perform slightly better than Sunny Cove cores. Now with chip D that has 8 times the number of cores than chip A, each core is starved for power. You are suddenly in the position that Tremont performance dominates Sunny Cove performance.

If Gracemont/Golden Cove behave similarly to Tremont/Sunny Cove, then you find out that when you go to 16 Intel cores or more, then it is better to have all smaller cores. At high power levels, yes Golden Cove should dominate. But once you divide your available power up into 16, 24, 32+ small slices, you find out that Gracemont actually is the better core when each core only gets a small power envelope.

1623358280760.png
 
  • Like
Reactions: SAAA

dullard

Elite Member
May 21, 2001
22,972
1,228
126
@jpiniero

I see what you're trying to say, but Alder Lake isn't going to be a workstation or HEDT chip. Unless Intel is tacitly admitting that they won't have a workstation/HEDT chip.
Mobile < Hybrid < Desktop < HEDT < Server

Alder Lake is Intel's attempt to slip in a new category between Mobile and Desktop. I do not think that it is really intended for HEDT. I thought that the rumors claim Intel's HEDT chips come from Sapphire Rapids.
 
Last edited:
  • Like
Reactions: Magic Carpet

DrMrLordX

Lifer
Apr 27, 2000
17,664
6,657
136
Alder Lake is Intel's attempt to slip in a new category between Mobile and Desktop. I do not think that it is really intended for HEDT.
The problem with your graph is that, undoubtedly, you're discussing MT throughput vs relative power. If Intel is trying to straddle the mobile and desktop market, the sweet spot for now and maybe the next 1-2 years is going to be 8c/16t "big" cores like Golden Cove. Maybe a generation or two down the road, if Core keeps advancing at IPC of 10-15% per generation (or less) while Atom continues advancing at 20% or more per generation, Atom will eventually make more sense that Core except in certain niche, high-power situations. For now, adding more and more Gracemont for consumer SoCs is so much pissing in the wind. Pardon me for being redundant (because I am), and I DO recognize that Intel has a perf/watt problem with Core right now that may not improve with Golden Cove. Which means that going for higher core counts with Golden Cove could be problematic, especially in their target market. But their competition is able to add up to 16c in consumer CPUs and get good perf/watt doing so. Intel needs to be able to do the same.

You also have to factor in that the Alder Lake die is already going to be gigantic. If they do go to 8 core Small Clusters that might be a cheap way in area to improve the MT performance.
Intel may not have enough die real estate to add anything else.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
Is this additional area what makes these Cove architectures advanced compared to Skylake arch? Does AMD improve their Zen cores the same way between its generations? Is there any other way to make computing faster, thats not about making things wider and bigger and whatnot? I guess something like quantum computing, except not as outlandish?
Adding to the already nice explanations made by others,

Simply put, if you have more execution units like load/store units, ALUs, FPUs, the CPU will in general perform better. And such a chip is naturally larger/more power hungry because each unit takes many transistors.

Then there are small details like reservation stations, reorder buffers, branch predictors which all take transistors and thus extra die space and power use.

There are smarter ways to do things, and Out of Order(OoO) execution is one of them. Enhancing branch prediction is another. But ideas are much harder to implement than just straight out expanding whatever you have.

Gracemont is a very different core from Skylake. Yet it's a far compact and more efficient core.
 
  • Like
Reactions: Tlh97

Thala

Golden Member
Nov 12, 2014
1,263
572
136
The problem with your graph is that, undoubtedly, you're discussing MT throughput vs relative power. If Intel is trying to straddle the mobile and desktop market, the sweet spot for now and maybe the next 1-2 years is going to be 8c/16t "big" cores like Golden Cove. Maybe a generation or two down the road, if Core keeps advancing at IPC of 10-15% per generation (or less) while Atom continues advancing at 20% or more per generation, Atom will eventually make more sense that Core except in certain niche, high-power situations. For now, adding more and more Gracemont for consumer SoCs is so much pissing in the wind. Pardon me for being redundant (because I am), and I DO recognize that Intel has a perf/watt problem with Core right now that may not improve with Golden Cove. Which means that going for higher core counts with Golden Cove could be problematic, especially in their target market. But their competition is able to add up to 16c in consumer CPUs and get good perf/watt doing so. Intel needs to be able to do the same.
Absolutely agree. Originally big.LITTLE was conceptionally invented for extreme low utilization situations, where small cores do have an advantage both from dynamic power and particularly from perspective of leakage - the littles cores were never supposed to be the driving factor for max MT performance. As consequence, you will see less little cores when you scale up from mobile to HEDT or server, where you wont have little cores at all. Now enter Intel, where the little cores are used to boost peak MT performance, just because they are not competitive with their big cores.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
Food for thought: how does one reconcile the rumored "improved cache for gaming" (as per the leaked Intel slide) with the increased reliance on small cores (as per the MLID leak claim)? To me it seems these require diametrically opposed design decisions, your idea of double ring bus shows just how much a design can change depending on workload priorities.
I'll say, MLID has been ok on Intel leaks.

But few things going against that bit about Raptor Lake. First, ring is the problem in scaling to more cores. Second, it said on that Intel slide: "New Hybrid CPU core changes for improved performance".

Does "improved" really mean more cores, either Golden Cove or Gracemont? Or is it improved because the algorithm has improved, or the way the cores communicate has changed?
 
  • Like
Reactions: Tlh97 and RanFodar

Exist50

Senior member
Aug 18, 2016
319
341
136
I'll say, MLID has been ok on Intel leaks.

But few things going against that bit about Raptor Lake. First, ring is the problem in scaling to more cores. Second, it said on that Intel slide: "New Hybrid CPU core changes for improved performance".

Does "improved" really mean more cores, either Golden Cove or Gracemont? Or is it improved because the algorithm has improved, or the way the cores communicate has changed?
If we accept that Raptor Lake is meant to be a low-effort stopgap for Meteor Lake/7nm delays, then it does make sense. 12 ring stops might not be ideal, but it's been done before with Broadwell-EP, and is certainly lower effort than splitting into two rings. Likewise, minor improvements to Golden Cove would also fit for such a project.
 

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
If we accept that Raptor Lake is meant to be a low-effort stopgap for Meteor Lake/7nm delays, then it does make sense.
I wouldn't say low-effort much as something that fills in the weaknesses. Even if they don't officially Tick-Tock, the idea always existed even in a much subtler way.

Nehalem generation only covered certain market segments. It was really about the server.

Sandy Bridge covered all, and client gains were huge because that became their focus.

Core 2 was a client focus as well.

Icelake was a good start, but Tigerlake fleshed out the weaknesses. Even though the core is essentially the same. Alderlake might get the hybrid methodology correct most of the time, but Raptorlake might get most of the rest. Alderlake, despite what Intel claims is Yonah-Desktop. Raptorlake might see Intel actually focus on desktop.

As consequence, you will see less little cores when you scale up from mobile to HEDT or server, where you wont have little cores at all. Now enter Intel, where the little cores are used to boost peak MT performance, just because they are not competitive with their big cores.
Actually, while the explicit meaning of big.little may be what you are saying, the original idea about the heterogenous core is trying to circumvent the whole square root law when it comes to processors.

On paper, if you double the potential performance per clock by 2x, the area and the power requirement quadruples. So at some point, and for embarassingly parallel applications, having a core that's performing half the performance but 1/4 the power and area is useful. So you can combine the two approaches, so the big core accelerates responsiveness and low thread applications, and the sea of small cores accelerate parallel tasks.

Of course it's much harder to make it work. Hence I believe it's why both Apple and Intel focus on getting the so-called "little" cores to be much more performant. Because the worst-case scenario isn't so bad with a high performance smaller core.

So the idea might not be a silver bullet to an efficient, high performance CPU for all scenarios, but a useful and necessary step in getting there.
 
Last edited:

Exist50

Senior member
Aug 18, 2016
319
341
136
I wouldn't say low-effort much as something that fills in the weaknesses. Even if they don't officially Tick-Tock, the idea always existed even in a much subtler way.
I don't think this was planned so well. If Intel truly thought they'd have 7nm ready by the end of '21, then is stands to reason that they'd have products (MTL) scheduled by sometime in '22. My guess is that when either 7nm or MTL execution began to slip, and they needed to figure out a stopgap. Raptor Lake is like Coffee Lake or Comet Lake in that regard, but with maybe a little time for spit and polish.

On paper, if you double the potential performance per clock by 2x, the area and the power requirement quadruples.
This is a trend of sorts, but is by no means a law. Comparing Apple's cores to Intel's shows as much, but even AMD is more PPA efficient. If Core is to have a future, Intel needs to get it under control.
 
  • Like
Reactions: Tlh97 and uzzi38

eek2121

Golden Member
Aug 2, 2005
1,228
1,285
136
Intel needs to adopt a chiplet based design. Having 8 “big” cores on one chiplet and 16-32 cores on another would be pretty cool.
 

dullard

Elite Member
May 21, 2001
22,972
1,228
126
Intel needs to adopt a chiplet based design. Having 8 “big” cores on one chiplet and 16-32 cores on another would be pretty cool.
I've been around long enough to remember the first Intel chiplet processor. The internet reviews and forums turned Intel into a laughing stock calling Intel's chips "glued together". (see https://www.anandtech.com/show/1656/2 for one example) It is so odd for me to now see internet forums call for Intel to go back to chiplets (and of course people now think it was Intel that was making fun of AMD for gluing chips).

Your concept is very much what Intel is planning on doing:
 

ASK THE COMMUNITY