• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 373 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Senior member
Aug 18, 2016
205
254
136
Or theory #4: Intel want to keep kernel developers on their toes!
Nevermind the scheduler changes Zen and the NUMA layout, moving and scheduling across totally different cores is going to be way harder.
Yes, ARM's Big.little had pioneered a lot of this but didn't ARM always maintain the same ISA and instruction sets between the cores used for Big.little?
While unsupported instruction could cause and exception and either be emulated or forced into the other type of core, this sounds like a lot more work.
Plus, whatever happened to AVX512 taking over the world?
The cores are limited to the least common denominator, at least in Alder Lake.
 

yuri69

Member
Jul 16, 2013
111
123
116
Theory #1 is correct - the power efficiency would be very nice for benchmarks like web browsing, video playback, or office loads. Of course, this requires the scheduler to put those processes on the LITTLE cores && do not schedule anything on the big cores thus letting them reach deep sleep.

The general strategy is to keep all those "hungry" cores in deep sleep and wake them only when it makes sense. This way it's possible to run high-priority CPU intensive threads on 1-2 big cores, the low-prio threads occupy the LITTLE cores and the rest of big cores are in deep sleep.
 

jpiniero

Diamond Member
Oct 1, 2010
9,042
1,742
126
It's really a combination of Post 9300 and marketing liking the idea of calling it 16 cores.
 

Ajay

Diamond Member
Jan 8, 2001
8,320
3,203
136
The cores are limited to the least common denominator, at least in Alder Lake.
Gracemont has been updated quite a bit, so the LCD is pretty good if true. Good luck to MS in developing an improved scheduler that makes proper use of both sets of cores. At least they have examples from the Linux/Android world as a reference; ex. HMP
 

moonbogg

Lifer
Jan 8, 2011
10,014
1,801
126
Yeah, I still don't get it. I can't imagine any reason to put laptop cores in a desktop chip. It doesn't matter if they are as powerful as last gen stuff. Using chip area for weaker cores doesn't make sense to me when you can instead use the area for more of the newer and better cores. It sounds like it's a money saving strategy where they don't have to design a middle chip for consumer desktop, but instead just use the laptop design as someone else mentioned here. Maybe HEDT will make a return with high-end chips that have all real cores and no laptop cores in them. Then once again, Intel will charge $1800 for a 10 core CPU? One can only hope...
 

Hulk

Platinum Member
Oct 9, 1999
2,973
406
126
Really depends on how software utilizes cores? Do they all need to be Big to run the app quickly or is 8 big 8 little better than 12 big sometimes?
 

lobz

Golden Member
Feb 10, 2017
1,620
2,083
136
Really depends on how software utilizes cores? Do they all need to be Big to run the app quickly or is 8 big 8 little better than 12 big sometimes?
Yeah, of all the niche use cases, this is just really out there in a desktop environment. I think.
 

majord

Senior member
Jul 26, 2015
366
356
136
No matter how you spin it. It's pointless for desktop. Not even NUC level machines will see any real world benefits from the 1 or 2 watt savings.. That's big for battery life savings , but nothing else. These modern HP cores are already not bad efficiency wise if not being pegged at 5Ghz.
 

Hulk

Platinum Member
Oct 9, 1999
2,973
406
126
No matter how you spin it. It's pointless for desktop. Not even NUC level machines will see any real world benefits from the 1 or 2 watt savings.. That's big for battery life savings , but nothing else. These modern HP cores are already not bad efficiency wise if not being pegged at 5Ghz.
I'm not trying to spin it I'm trying to think about it with an open mind, which I know can be dangerous on the internet.

I found an interesting portable program called processthreadview that allows you to see exactly what an application is doing process-wise. I'm still fooling around with it.
.

If I'm understanding you clearly you are saying that there is no use for small cores on the desktop. In every application at all times for a given die area big, fast cores will always beat a mix of big and small cores, right?

I don't claim to know the answer to this and am not challenging your assertion but I would like to know upon what tests, simulations, or other knowledge you have come to this conclusion? I'm truly here to learn. I freely admit when it comes to microprocessor design I'm a clueless mechanical engineer who enjoys following the microprocessor industry.
 

dr1337

Member
May 25, 2020
120
178
76
These modern HP cores are already not bad efficiency wise
Exactly, thats the whole point. Its rumored that the little cores are ~40% slower than the big cores IPC wise, aka skylake level. If golden cove is the big chonker intel says it is, it should be decently more power hungry than tiger lake let alone skylake, even on 10nm. Also I don't think intel of all companies could possibly overstate the importance of efficiency on the desktop. They just had to lop two cores off their incoming desktop flagship because the IPC gains increased power consumption so much.

Imo I don't see the little cores being all that bad for desktop as long as scheduling is done intelligently. 8 golden cove cores at 5ghz as is would be very compelling as they are. The 8 extra small-ish cores is just icing on the cake. And its definitely the right move for intel as being able to sell '16' cores on mobile guarantees they at least don't fall behind apple in marketing. And I'm really not convinced that the extra investment of a desktop specific chip would actually be worth it. I don't think a dedicated 12 big core chip would necessarily if at all sell better than 8 big 8 little.
 
  • Like
Reactions: Tlh97 and Hulk

LightningZ71

Senior member
Mar 10, 2017
711
637
136
Something else to consider is that having separate, power optimized cores allows them to also use far more aggressive power management on them, having isolated voltage planes, spinning them down more quickly, etc. Even in some of the most heavily single threaded situations, OSes have stuff to do behind the scenes. It's far better to shuffle that stuff off to low power, low thermal cores to avoid interrupting the big cores and to better manage thermals.

Obviously, we're well within the territory of every few percent counts.
 

majord

Senior member
Jul 26, 2015
366
356
136
I'm not trying to spin it I'm trying to think about it with an open mind, which I know can be dangerous on the internet.

I found an interesting portable program called processthreadview that allows you to see exactly what an application is doing process-wise. I'm still fooling around with it.
.

If I'm understanding you clearly you are saying that there is no use for small cores on the desktop. In every application at all times for a given die area big, fast cores will always beat a mix of big and small cores, right?

I don't claim to know the answer to this and am not challenging your assertion but I would like to know upon what tests, simulations, or other knowledge you have come to this conclusion? I'm truly here to learn. I freely admit when it comes to microprocessor design I'm a clueless mechanical engineer who enjoys following the microprocessor industry.
No I'm not saying under ALL circumstances. Even the worst of ideas have had circumstances where they made sense, but overall, yes. For the desktop/workstation/enthusiast, I don't see any benefits worth the trouble.

Ultimatly there are two areas it make sense ..

1. Low power mobile, with moderate number of cores , where battery life can be greatly affected by saving 1-2w , even sub W savings are a big deal.

2. Many core Server/enterprise, where a large number of low power, high efficiency cores can actually be used en-mass by workloads that scale to many cores, but where a small number of high ST performance cores are needed concurently (and there are many such cases )

The Desktop market however literally sits in a void between these two..

On the one hand, Desktop is power sensitive too - but not down to this level, and not at low power states, Particularly in the , say, 35-125w market, where other system components are so power hungry, any power savings seen by this sort of scheme virtually fade into the back ground at a system power level. Desktop is more about the peak power consumption, and designing cooling capabilty and power supply around that. Not what's happening at idle, or browsing the web.

On the other hand Desktop workloads can scale to high thread counts, so you could argue the advantages there , but problem is, desktop workloads, i think we all agree don't scale high enough thread wise to see the benefits. Not that is of any relevance with this iteration since it's only 8+8 , 24 thread config. Lower than a 5950x consisting off all high performance cores.

It will be interesting to see the outcome, but there's a good chance a Zen 3 core with SMT will have similar throughput and power consumption as 2x Gracemont. particuarly if its clocked high in order to maximise performance. Probably more to the point, how many gracemont will it take to equal Golden cove's throughput? which should be higher again than Zen 3 (which i'm only mentioning incidently because it's by far a perf/watt benchmark for high performance cores x86 cores atm.. ditto perf/area )
 
Last edited:

Exist50

Senior member
Aug 18, 2016
205
254
136
Isn't ADL-S BGA for laptops too?
I suppose so, but we're talking ~55W "desktop replacement" devices. It's quite a small niche. Fundamentally, the design point is still for desktops.

Plus, I would expect at least some lag between ADL-S showing up in desktops and ADL-SBGA showing up in laptops. Might very well see ADL-P laptops first.
 

coercitiv

Diamond Member
Jan 24, 2014
4,256
5,379
136
Probably more to the point, how many gracemont will it take to equal Golden cove's throughput?
Gracemont cluster will look good when compared against Golden Cove when:
  • workload scales well to more threads
  • Golden Cove clocks are down in the all-core turbo range
The problem with hyping Gracemont via the "Skylake IPC" tag is nobody knows how high it will clock. From a purely subjective point of view the "Skyalke IPC" reference point looks very good... until you start asking questions about absolute performance. Will it clock close to 5Ghz? Does it make any sense at all to make it capable of 5Ghz when it's main purpose is increased throughput under heavy workloads? And finally, does Skylake IPC still sound good when Golden Cove may end up being 70-100% faster? (40% IPC advantage, 20-45% clock advantage)

Here's a big question for some of the people in this thread, what would you choose between the following:
  • Comet Lake 10+0 , 10c/20t Skylake cores
  • Cosmic Lake 8+8, 8c/16t Skylake cores + 8c/8t Nehalem cores
 

lobz

Golden Member
Feb 10, 2017
1,620
2,083
136
Something else to consider is that having separate, power optimized cores allows them to also use far more aggressive power management on them, having isolated voltage planes, spinning them down more quickly, etc. Even in some of the most heavily single threaded situations, OSes have stuff to do behind the scenes. It's far better to shuffle that stuff off to low power, low thermal cores to avoid interrupting the big cores and to better manage thermals.

Obviously, we're well within the territory of every few percent counts.
I don't think that single-threaded cores with Skylake IPC are an efficient solution to any combination of 'behind the scenes OS stuff'.
 

lobz

Golden Member
Feb 10, 2017
1,620
2,083
136
Gracemont cluster will look good when compared against Golden Cove when:
  • workload scales well to more threads
  • Golden Cove clocks are down in the all-core turbo range
The problem with hyping Gracemont via the "Skylake IPC" tag is nobody knows how high it will clock. From a purely subjective point of view the "Skyalke IPC" reference point looks very good... until you start asking questions about absolute performance. Will it clock close to 5Ghz? Does it make any sense at all to make it capable of 5Ghz when it's main purpose is increased throughput under heavy workloads? And finally, does Skylake IPC still sound good when Golden Cove may end up being 70-100% faster? (40% IPC advantage, 20-45% clock advantage)

Here's a big question for some of the people in this thread, what would you choose between the following:
  • Comet Lake 10+0 , 10c/20t Skylake cores
  • Cosmic Lake 8+8, 8c/16t Skylake cores + 8c/8t Nehalem cores
The second one, just for the name. I finally wanna see a cosmic lake!!
 
  • Like
Reactions: Tlh97

LightningZ71

Senior member
Mar 10, 2017
711
637
136
I'm also of the belief that they won't be clocking the small cores especially high. If they REALLY are supposed to be all about energy efficiency, why would they allow them to go beyond the inflection point in the power draw / clock speed graph where it starts taking a lot more power for each additional 100mhz? I tend to think that the small cores will be limited to the 3.5-3.8Ghz range. Why? I think that the i3-1125G4 informs here as it is based on the most similar current process tech in production. It seems to me that the i3-1125G4 is an otherwise largely functional Tiger Lake die that is missing power targets to be used as an i5/i7 product. They clock it down deep in the power/performance curve to enable it to clear the target wattage, but have to sell it as an i3. I'm assuming that 10esf won't be dramatically more efficient in that area, so, a clock speed in the 3.5-3.8Ghz range seems to be where power efficiency starts taking a big hit for extra performance. If that's the target for the cores, they won't be pipeline optimized for much higher clock speeds. If there really was something that would stress them so badly that they would need to go faster, then it's better run on the big cores to begin with.

So, working off those assumptions, (~skylake IPC, clocks around 3.7Ghz), what we're looking at is a section of the processor that's going to behave a lot like the i7-6900K or a pair of i7-6700/6700T processors, but run at 15/35W. That's not a particularly bad place to be.

Again, that's as much a S.W.A.G. as anything.
 
  • Like
Reactions: Tlh97 and misuspita

mikk

Diamond Member
May 15, 2012
3,085
911
136
I'm also of the belief that they won't be clocking the small cores especially high. If they REALLY are supposed to be all about energy efficiency, why would they allow them to go beyond the inflection point in the power draw / clock speed graph where it starts taking a lot more power for each additional 100mhz? I tend to think that the small cores will be limited to the 3.5-3.8Ghz range.

I'm expecting higher than this for several reasons (if you refer to the desktop). First of all back in 2018 Intel highlighted a frequency increase for Gracemont unlike Tremont or Golden Cove: https://images.anandtech.com/doci/13699/1-Roadmap.jpg

Furthermore Jasper Lake SKUs (10W) can boost up to 3.3 Ghz with the poor Icelake 10nm. The difference from 10nm to SuperFin is big and to the enhanced SuperFin even bigger, ADL will use enhanced SuperFin. Also the fastest desktop SKUs are not energy optimized by nature, a performance optimized 125W SKU won't run with the most efficient clock speeds.
 

Hulk

Platinum Member
Oct 9, 1999
2,973
406
126
Let's review. I don't think there is much debate that the Big/Little concept is useful for mobile when efficiency is of the utmost importance. We don't have to look further than phones to see that since that is the norm these days it must be advantageous.

I think most of us would agree that Intel is going with a Big/Little design for Alder Lake because mobile is their first priority. The question to be asked is as follows:

Is the hybrid design a detriment to desktop performance and Intel will just do the best than can with it?
Or have they found a way that for a given transistor budget they can achieve higher performance with Big/Little?

While there is little doubt Intel will stay in the optimum efficiency range of the schmoo plot for mobile, they have demonstrated that if enormous power is required for desktop to be competitive they will do it. So I don't take much stock in the theory that for the desktop they will limit Big or Little core speed for power reasons, especially at the top of the stack where competition with AMD is paramount.

It seems to me that the elephant in the room here is for a given application do all threads need the same amount of compute? I don't know. The easiest way to design to to assume they do and throw as many powerful cores at the application as you can. ie Threadripper

A more elegant solution might be to have two "bucket's" of cores. Big and Little and assign them based on the compute needs of various threads for the ongoing process. Of course this can be tailored to a specific application but can it work across the board? As I wrote above I found a neat little app called "processthreadsview" that allows you to see the threads created by a specific application, the rate of context switching and the user/kernel time for each thread.

I have only started to check it out but it seems that the rate of context switching could be meaningful. As we know, everytime the context is switched for a thread there is inefficiency as one thread is flushed from the CPU and another one started. The less context switching the better.

Here's a screenshot of Handbrake running the benchmark from this forum. Lots of threads are spawned and the first 12 are doing quite a bit of context switching. This is from my 4770k. It would be interesting to see what this looks like with 8/16 core, 12/24 cores, and 16/32 cores. Seems like 7 or 8 threads are getting hammered pretty good by the context switching rate, which I believe is switches/second. So those "main" threads are being cut away from to service other threads frequently.

Handbrake Context Switches.jpg

Here is a screen shot of Presonus Studio One playing back a relatively lightly loaded multitrack song. Here we have 7 threads with high context switching and 8 or so other threads which seem to be lightly loaded and being serviced during the context switches from the main threads.

Presonus Studio One context switches.jpg

Honestly I don't know what the heck this all really means but it seems it could be that these applications could do well with 8 dedicated Big cores and a bunch of Little cores to service the less heavily loaded threads. Or it could mean nothing because the context switching performance hit is negligible.

One more interesting observation is that the user time for Studio One is focused on 8 threads. For Handbrake it's 7 threads, plus one less loaded one. Is this because my 4770k has 8 logical processors or is this because these apps were coded with 4/8 core processors in mind? If someone ran the Handbrake test on an 8 core rig we'd at least know that right?

But I thought it was interesting enough to post.
 
  • Like
Reactions: Tlh97 and KompuKare

jpiniero

Diamond Member
Oct 1, 2010
9,042
1,742
126
I'm expecting higher than this for several reasons (if you refer to the desktop). First of all back in 2018 Intel highlighted a frequency increase for Gracemont unlike Tremont or Golden Cove: https://images.anandtech.com/doci/13699/1-Roadmap.jpg

Furthermore Jasper Lake SKUs (10W) can boost up to 3.3 Ghz with the poor Icelake 10nm. The difference from 10nm to SuperFin is big and to the enhanced SuperFin even bigger, ADL will use enhanced SuperFin. Also the fastest desktop SKUs are not energy optimized by nature, a performance optimized 125W SKU won't run with the most efficient clock speeds.
They don't need to go after every last Mhz on the small cores. 3.3 - 3.8 max feels right for locked.
 

ASK THE COMMUNITY

TRENDING THREADS