Discussion Intel current and future Lakes & Rapids thread

Hulk · Feb 20, 2021

moonbogg said:
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.

I have been thinking about this as well. Here are two theories. I'm not implying any of these theories are legit, just topics to perhaps start discussion of this topic. All speculation based on what I've learned so far.

Theory #1
Intel is under the assumption that anything over the top of the line Alder Lake part is super HEDT and consists of a very small niche market that can/will be served by another line of their processors. 8 cores is probably "enough" for 99% of the computer world. So Alder Lake is built primarily for mobile and to cover *most* desktop users. If one Gracemont core can run the whole shabang while watching video or surfing the web ( a Skylake core could do this) wouldn't that result in some crazy impressive battery life for Intel to advertise? All the while the Golden Cove snooze away.

Theory #2
Perhaps some applications have interdependencies that can be well served by the Big/Little strategy. For example, a certain application is running 16 threads, let's just consider physical processors. Perhaps only 6 or 7 of these threads are really compute heavy, but there are 6 more that could cause the Big cores to inefficiently switch among threads and the application would run faster if the Big cores handle the compute heavy threads and a bunch of Little cores run the lighter load threads.

If the little cores are Skylake level compute we're not talking about old school Atom weaklings. Skylake level for the Little cores would be pretty impressive. Think about an 8 core Ice Lake running at 5GHz AND and 8 core non HT Skylake running nearly as fast. That would be a very potent combination when it's essentially a 10700 plus 8 Golden Cove cores. And if what I'm thinking above about light/heavy thread loads is true it could perform equal to or better than the 5950X since the Big cores would be faster than Zen 3.

Theory #3 (okay I'm really reaching on this one)
Some combination of theories #1 and #2 coupled with the fact that some clever juxtaposition of the Big/Little cores allows for superior heat transfer than all big cores. Could the die be laid out Big/Little/Big Little, etc... so that at full bore the little cores (running slower) could absorb and dissipate some of the heat from the big ones. Basically I'm talking about arranging the die to avoid hot spot.

KompuKare · Feb 21, 2021

Or theory #4: Intel want to keep kernel developers on their toes!
Nevermind the scheduler changes Zen and the NUMA layout, moving and scheduling across totally different cores is going to be way harder.
Yes, ARM's Big.little had pioneered a lot of this but didn't ARM always maintain the same ISA and instruction sets between the cores used for Big.little?
While unsupported instruction could cause and exception and either be emulated or forced into the other type of core, this sounds like a lot more work.
Plus, whatever happened to AVX512 taking over the world?

Exist50 · Feb 21, 2021

KompuKare said:
Or theory #4: Intel want to keep kernel developers on their toes!
Nevermind the scheduler changes Zen and the NUMA layout, moving and scheduling across totally different cores is going to be way harder.
Yes, ARM's Big.little had pioneered a lot of this but didn't ARM always maintain the same ISA and instruction sets between the cores used for Big.little?
While unsupported instruction could cause and exception and either be emulated or forced into the other type of core, this sounds like a lot more work.
Plus, whatever happened to AVX512 taking over the world?

The cores are limited to the least common denominator, at least in Alder Lake.

yuri69 · Feb 21, 2021

Theory #1 is correct - the power efficiency would be very nice for benchmarks like web browsing, video playback, or office loads. Of course, this requires the scheduler to put those processes on the LITTLE cores && do not schedule anything on the big cores thus letting them reach deep sleep.

The general strategy is to keep all those "hungry" cores in deep sleep and wake them only when it makes sense. This way it's possible to run high-priority CPU intensive threads on 1-2 big cores, the low-prio threads occupy the LITTLE cores and the rest of big cores are in deep sleep.

jpiniero · Feb 21, 2021

It's really a combination of Post 9300 and marketing liking the idea of calling it 16 cores.

Ajay · Feb 21, 2021

Exist50 said:
The cores are limited to the least common denominator, at least in Alder Lake.

Gracemont has been updated quite a bit, so the LCD is pretty good if true. Good luck to MS in developing an improved scheduler that makes proper use of both sets of cores. At least they have examples from the Linux/Android world as a reference; ex. HMP

moonbogg · Feb 21, 2021

Yeah, I still don't get it. I can't imagine any reason to put laptop cores in a desktop chip. It doesn't matter if they are as powerful as last gen stuff. Using chip area for weaker cores doesn't make sense to me when you can instead use the area for more of the newer and better cores. It sounds like it's a money saving strategy where they don't have to design a middle chip for consumer desktop, but instead just use the laptop design as someone else mentioned here. Maybe HEDT will make a return with high-end chips that have all real cores and no laptop cores in them. Then once again, Intel will charge $1800 for a 10 core CPU? One can only hope...

Hulk · Feb 21, 2021

Really depends on how software utilizes cores? Do they all need to be Big to run the app quickly or is 8 big 8 little better than 12 big sometimes?

lobz · Feb 21, 2021

Hulk said:
Really depends on how software utilizes cores? Do they all need to be Big to run the app quickly or is 8 big 8 little better than 12 big sometimes?

Yeah, of all the niche use cases, this is just really out there in a desktop environment. I think.

majord · Feb 21, 2021

No matter how you spin it. It's pointless for desktop. Not even NUC level machines will see any real world benefits from the 1 or 2 watt savings.. That's big for battery life savings , but nothing else. These modern HP cores are already not bad efficiency wise if not being pegged at 5Ghz.

Hulk · Feb 21, 2021

majord said:
No matter how you spin it. It's pointless for desktop. Not even NUC level machines will see any real world benefits from the 1 or 2 watt savings.. That's big for battery life savings , but nothing else. These modern HP cores are already not bad efficiency wise if not being pegged at 5Ghz.

I'm not trying to spin it I'm trying to think about it with an open mind, which I know can be dangerous on the internet.

I found an interesting portable program called processthreadview that allows you to see exactly what an application is doing process-wise. I'm still fooling around with it.

ProcessThreadsView - View process threads information On Windows

small utility that displays extensive information about all threads of the process that you choose

www.nirsoft.net

.

If I'm understanding you clearly you are saying that there is no use for small cores on the desktop. In every application at all times for a given die area big, fast cores will always beat a mix of big and small cores, right?

I don't claim to know the answer to this and am not challenging your assertion but I would like to know upon what tests, simulations, or other knowledge you have come to this conclusion? I'm truly here to learn. I freely admit when it comes to microprocessor design I'm a clueless mechanical engineer who enjoys following the microprocessor industry.

dr1337 · Feb 21, 2021

majord said:
These modern HP cores are already not bad efficiency wise

Exactly, thats the whole point. Its rumored that the little cores are ~40% slower than the big cores IPC wise, aka skylake level. If golden cove is the big chonker intel says it is, it should be decently more power hungry than tiger lake let alone skylake, even on 10nm. Also I don't think intel of all companies could possibly overstate the importance of efficiency on the desktop. They just had to lop two cores off their incoming desktop flagship because the IPC gains increased power consumption so much.

Imo I don't see the little cores being all that bad for desktop as long as scheduling is done intelligently. 8 golden cove cores at 5ghz as is would be very compelling as they are. The 8 extra small-ish cores is just icing on the cake. And its definitely the right move for intel as being able to sell '16' cores on mobile guarantees they at least don't fall behind apple in marketing. And I'm really not convinced that the extra investment of a desktop specific chip would actually be worth it. I don't think a dedicated 12 big core chip would necessarily if at all sell better than 8 big 8 little.

LightningZ71 · Feb 22, 2021

Something else to consider is that having separate, power optimized cores allows them to also use far more aggressive power management on them, having isolated voltage planes, spinning them down more quickly, etc. Even in some of the most heavily single threaded situations, OSes have stuff to do behind the scenes. It's far better to shuffle that stuff off to low power, low thermal cores to avoid interrupting the big cores and to better manage thermals.

Obviously, we're well within the territory of every few percent counts.

Exist50 · Feb 22, 2021

Where is all the talk of a mobile chips coming from? The 8+8 config is a desktop die. Mobile gets 6+8 or 2+8.

Zucker2k · Feb 22, 2021

Exist50 said:
Where is all the talk of a mobile chips coming from? The 8+8 config is a desktop die. Mobile gets 6+8 or 2+8.

To be frank, 6+8 is not that far from 8+8 hehe

majord · Feb 22, 2021

Hulk said:
I'm not trying to spin it I'm trying to think about it with an open mind, which I know can be dangerous on the internet.

I found an interesting portable program called processthreadview that allows you to see exactly what an application is doing process-wise. I'm still fooling around with it.

ProcessThreadsView - View process threads information On Windows

small utility that displays extensive information about all threads of the process that you choose

www.nirsoft.net

.

If I'm understanding you clearly you are saying that there is no use for small cores on the desktop. In every application at all times for a given die area big, fast cores will always beat a mix of big and small cores, right?

I don't claim to know the answer to this and am not challenging your assertion but I would like to know upon what tests, simulations, or other knowledge you have come to this conclusion? I'm truly here to learn. I freely admit when it comes to microprocessor design I'm a clueless mechanical engineer who enjoys following the microprocessor industry.

No I'm not saying under ALL circumstances. Even the worst of ideas have had circumstances where they made sense, but overall, yes. For the desktop/workstation/enthusiast, I don't see any benefits worth the trouble.

Ultimatly there are two areas it make sense ..

1. Low power mobile, with moderate number of cores , where battery life can be greatly affected by saving 1-2w , even sub W savings are a big deal.

2. Many core Server/enterprise, where a large number of low power, high efficiency cores can actually be used en-mass by workloads that scale to many cores, but where a small number of high ST performance cores are needed concurently (and there are many such cases )

The Desktop market however literally sits in a void between these two..

On the one hand, Desktop is power sensitive too - but not down to this level, and not at low power states, Particularly in the , say, 35-125w market, where other system components are so power hungry, any power savings seen by this sort of scheme virtually fade into the back ground at a system power level. Desktop is more about the peak power consumption, and designing cooling capabilty and power supply around that. Not what's happening at idle, or browsing the web.

On the other hand Desktop workloads can scale to high thread counts, so you could argue the advantages there , but problem is, desktop workloads, i think we all agree don't scale high enough thread wise to see the benefits. Not that is of any relevance with this iteration since it's only 8+8 , 24 thread config. Lower than a 5950x consisting off all high performance cores.

It will be interesting to see the outcome, but there's a good chance a Zen 3 core with SMT will have similar throughput and power consumption as 2x Gracemont. particuarly if its clocked high in order to maximise performance. Probably more to the point, how many gracemont will it take to equal Golden cove's throughput? which should be higher again than Zen 3 (which i'm only mentioning incidently because it's by far a perf/watt benchmark for high performance cores x86 cores atm.. ditto perf/area )

uzzi38 · Feb 22, 2021

Exist50 said:
Where is all the talk of a mobile chips coming from? The 8+8 config is a desktop die. Mobile gets 6+8 or 2+8.

Isn't ADL-S BGA for laptops too?

Exist50 · Feb 22, 2021

uzzi38 said:
Isn't ADL-S BGA for laptops too?

I suppose so, but we're talking ~55W "desktop replacement" devices. It's quite a small niche. Fundamentally, the design point is still for desktops.

Plus, I would expect at least some lag between ADL-S showing up in desktops and ADL-SBGA showing up in laptops. Might very well see ADL-P laptops first.

coercitiv · Feb 22, 2021

majord said:
Probably more to the point, how many gracemont will it take to equal Golden cove's throughput?

Gracemont cluster will look good when compared against Golden Cove when:

workload scales well to more threads
Golden Cove clocks are down in the all-core turbo range

The problem with hyping Gracemont via the "Skylake IPC" tag is nobody knows how high it will clock. From a purely subjective point of view the "Skyalke IPC" reference point looks very good... until you start asking questions about absolute performance. Will it clock close to 5Ghz? Does it make any sense at all to make it capable of 5Ghz when it's main purpose is increased throughput under heavy workloads? And finally, does Skylake IPC still sound good when Golden Cove may end up being 70-100% faster? (40% IPC advantage, 20-45% clock advantage)

Here's a big question for some of the people in this thread, what would you choose between the following:

Comet Lake 10+0 , 10c/20t Skylake cores
Cosmic Lake 8+8, 8c/16t Skylake cores + 8c/8t Nehalem cores

lobz · Feb 22, 2021

LightningZ71 said:
Something else to consider is that having separate, power optimized cores allows them to also use far more aggressive power management on them, having isolated voltage planes, spinning them down more quickly, etc. Even in some of the most heavily single threaded situations, OSes have stuff to do behind the scenes. It's far better to shuffle that stuff off to low power, low thermal cores to avoid interrupting the big cores and to better manage thermals.

Obviously, we're well within the territory of every few percent counts.

I don't think that single-threaded cores with Skylake IPC are an efficient solution to any combination of 'behind the scenes OS stuff'.

lobz · Feb 22, 2021

coercitiv said:
Gracemont cluster will look good when compared against Golden Cove when:

workload scales well to more threads

Golden Cove clocks are down in the all-core turbo range

The problem with hyping Gracemont via the "Skylake IPC" tag is nobody knows how high it will clock. From a purely subjective point of view the "Skyalke IPC" reference point looks very good... until you start asking questions about absolute performance. Will it clock close to 5Ghz? Does it make any sense at all to make it capable of 5Ghz when it's main purpose is increased throughput under heavy workloads? And finally, does Skylake IPC still sound good when Golden Cove may end up being 70-100% faster? (40% IPC advantage, 20-45% clock advantage)

Here's a big question for some of the people in this thread, what would you choose between the following:

Comet Lake 10+0 , 10c/20t Skylake cores

Cosmic Lake 8+8, 8c/16t Skylake cores + 8c/8t Nehalem cores

The second one, just for the name. I finally wanna see a cosmic lake!!

jpiniero · Feb 22, 2021

coercitiv said:
And finally, does Skylake IPC still sound good when Golden Cove may end up being 70-100% faster? (40% IPC advantage, 20-45% clock advantage)

You do get 4 for the price of 1.

DrMrLordX · Feb 22, 2021

coercitiv said:
Cosmic Lake 8+8, 8c/16t Skylake cores + 8c/8t Nehalem cores

If using the Nehalem cores prohibits the app from using AVX, AVX2, or AVX512 . . . pass!

LightningZ71 · Feb 22, 2021

I'm also of the belief that they won't be clocking the small cores especially high. If they REALLY are supposed to be all about energy efficiency, why would they allow them to go beyond the inflection point in the power draw / clock speed graph where it starts taking a lot more power for each additional 100mhz? I tend to think that the small cores will be limited to the 3.5-3.8Ghz range. Why? I think that the i3-1125G4 informs here as it is based on the most similar current process tech in production. It seems to me that the i3-1125G4 is an otherwise largely functional Tiger Lake die that is missing power targets to be used as an i5/i7 product. They clock it down deep in the power/performance curve to enable it to clear the target wattage, but have to sell it as an i3. I'm assuming that 10esf won't be dramatically more efficient in that area, so, a clock speed in the 3.5-3.8Ghz range seems to be where power efficiency starts taking a big hit for extra performance. If that's the target for the cores, they won't be pipeline optimized for much higher clock speeds. If there really was something that would stress them so badly that they would need to go faster, then it's better run on the big cores to begin with.

So, working off those assumptions, (~skylake IPC, clocks around 3.7Ghz), what we're looking at is a section of the processor that's going to behave a lot like the i7-6900K or a pair of i7-6700/6700T processors, but run at 15/35W. That's not a particularly bad place to be.

Again, that's as much a S.W.A.G. as anything.

mikk · Feb 22, 2021

LightningZ71 said:
I'm also of the belief that they won't be clocking the small cores especially high. If they REALLY are supposed to be all about energy efficiency, why would they allow them to go beyond the inflection point in the power draw / clock speed graph where it starts taking a lot more power for each additional 100mhz? I tend to think that the small cores will be limited to the 3.5-3.8Ghz range.

I'm expecting higher than this for several reasons (if you refer to the desktop). First of all back in 2018 Intel highlighted a frequency increase for Gracemont unlike Tremont or Golden Cove: https://images.anandtech.com/doci/13699/1-Roadmap.jpg

Furthermore Jasper Lake SKUs (10W) can boost up to 3.3 Ghz with the poor Icelake 10nm. The difference from 10nm to SuperFin is big and to the enhanced SuperFin even bigger, ADL will use enhanced SuperFin. Also the fastest desktop SKUs are not energy optimized by nature, a performance optimized 125W SKU won't run with the most efficient clock speeds.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Golden Member

Platinum Member

Senior member

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Senior member

Diamond Member

Senior member

Platinum Member

Platinum Member

Golden Member

Senior member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Lifer

Lifer

Platinum Member

Diamond Member