• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 372 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

HurleyBird

Platinum Member
Apr 22, 2003
2,269
700
136
actually, you're not comparing IPC at all.
At this point, I think we all just need to come to terms with the fact that IPC has become synonymous with performance per clock and no amount of "well, actually" can change that. As long as people realize that they aren't quite using the term in the most literal sense, it's fine.
 

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
At this point, I think we all just need to come to terms with the fact that IPC has become synonymous with performance per clock and no amount of "well, actually" can change that. As long as people realize that they aren't quite using the term in the most literal sense, it's fine.
Absolutely great point. We have to respect terminology so I think finding another term for what we are measuring as IPC for discussion purposes is appropriate. Modern processors have so many features that tend to defy the traditional IPC definition such as hyperthreading (SMT) or even certain new ISA instructions only suited to certain programs that when we talk about IPC when it comes to the point of where the wheels meet the road we are really talking about something like "clock efficiency."

I'll put this definition out there for comment as a real world of something like IPC that we can actually measure and discuss to know that we are talking about the same thing.

Clock Efficiency or maybe processor efficiency? I don't know though as that brings power to mind? Anyway it would be... At a particular frequency, the rate of output for a specific input. Example, fix the clock, input benchmark, measure the time for output and you have the rate. The time to completion can be used as the performance efficiency metric in this case.

Furthermore, if benchmarks could be run while measuring the "average effective clock" using HWinfo, as we are doing in the Handbrake bench in this forum, we could get a pretty valid result for what I'm calling clock efficiency. This is going to be more important moving forward with clocks jumping all over the place. HWinfo polls the clock on each processor constantly at a very fine grained level and then integrates to arrive at the average effective clock for the entire processor cluster on the CPU.

The main reason I analyzed the Anandtech data is not only for my curiosity but so we could discuss performance from the same set of results when talking about the generations.

I generally shy away from putting out controversial posts like that for fear of doing a lot of work and then being berated for it. You know the saying "no good deed goes unpunished." But I've been around here a long time and I know most of the people are like me in that we really enjoy following the processor industry on all fronts. Add to that the fact that I'm always willing to learn and I try to take posts directed at me in the best light not the worst. When someone posted "you should use Geomean." At first I was like all that work and I should have used geomean? Then after thinking about it I was like "yeah, that's an honest and positive contribution to the discussion, I can do that."



I think that should be solved on the OS front, just like multicore CPUs, SMT or even Bulldozer approach with modules (that were loaded in a poor order till they threated them as 1-2-3-4 cores + SMT).

A purely parallel load? Well, the small cores won't be that much slower in the first place, at worst I believe half the speed:
Goldmont 1.5 IPC x 5GHz = 7.5 units of work
Gracemont 1.0 IPC x 3.5 to 4 GHz = 3.5 to 4 units of work

Making asymmetrical loads scale well might be hard but if every smartphone can do this (even on 3 different clusters) I don't see Microsoft and Intel not pulling it off.
A good idea would be to run something like OS and other apps on the small cores constantly (think anti-virus, mail or background apps), then every other program will see only big cores with SMT threads.
Ideally when the load is light enough the big cores could be turned off and the workload go into the small cores clusters only, just as today cores throttle to idle clock speeds when they are not doing much.
Good points here. The units of work for Goldmont and Gracemont... did you pull them from the video of that guy on the previous page? I'm not critiquing, just wondering. His analysis (guesses) did seem reasonable.

Also, do get into even finer grained "thread proportioning" of work, I wonder if the scheduler mechanism can begin to take thermals into account? For example, you have 8 physical cores available. Within thermal parameters you can run 4 at full speed and 4 at quarter speed, or 2 at full, 2 at 3/4, and 4 at half, etc... A really smart scheduler would/could control the frequency of various threads to maximize overall application execution speeds. Adding in big/little cores just adds more variables into this "thread optimization" formula.

If you have an app coded such that it need one or two really screaming fast threads then it could crank them up at the expense of the other.

This would be an on-the-fly learning process for the OS and ideally it would involve AI so that it could "remember" previous runs, or better yet have access to an online database of similar data.
 
  • Like
Reactions: rUmX

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
Quick update - I'm going to call this "work rate."

I was very skeptical of the Skylake vs Sunny Cove results due to the thermal issues so I added an analysis using PCmag data. The geomean total I calculated with the Anandtech data was 20%, with the PCmag data 20.8%. I'm starting to become a believer that Sunny Cove is the real deal and RKL will actually show a ~20% work rate uplift over Skylake at the same frequencies. That is a quite significant real world performance uptick, especially if Intel can maintain high all core frequencies, which is most likely on the desktop considering the power/cooling available.

Notes:
This is not IPC as my colleagues have pointed out but a measure of the speed a processor computes a workload as compared to another at the same frequency.

Only CPU's from within the same testing review are compared.

I tried to maintain 3 significant digits because that is is how most results were reported. That being said precision and accuracy are unknown with these benchmarks.

If Ivy Bridge results seems higher than expected, keep in mind Ivy Bridge was a "Tick+" and had as Anand wrote "a more aggressive turbo" than Sandy Bridge. This could skew results.

Here are the Ivy Bridge core changes:
- Data structures previously statically shared between threads can now be dynamically shared (e.g. DSB queue), improves single threaded performance
- FP/integer divider delivers 2x throughput compared to Sandy Bridge
- MOV instructions no longer occupy an execution port, potential for improved ILP when MOVs are present



Intel Generational "Work Rate" Comparison ResultsGeomeanAverage
P4 to Conroe82.7%
83.4%​
Conroe to Nahalem
20.2%​
22.4%​
Nahalem to Sandy Bridge
11.8%​
12.2%​
Sandy Bridge to Ivy Bridge
6.7%​
6.9%​
Ivy Bridge to Haswell
8.7%​
8.9%​
Haswell to Skylake
8.9%​
9.5%​
Skylake to Sunny Cove
21.0%​
21.3%​
P4 to Conroe2.933ClockWork
Sysmark 2004X6800Pentium D 930NormalizedRate
Overall
371​
211​
206​
80.0%​
Internet Content Creation
482​
256​
250​
92.8%​
Office Productivity
285​
174​
170​
67.7%​
3D Content Creation
447​
232​
227​
97.3%​
2D Content Creation
568​
302​
295​
92.6%​
Web Publication
442​
240​
234​
88.6%​
Communication And Networking
202​
142​
139​
45.7%​
Document Creation
380​
206​
201​
88.9%​
Data Analysis
302​
180​
176​
71.8%​
PC WorldBench 5
Overall WorldBench Score
156​
99​
97​
61.3%​
3D Rendering
2dsmax7
4.11​
2.13​
2​
97.6%​
Cinebench 1CPU
486​
256​
250​
94.4%​
Cinebench XCPU
892​
460​
449​
98.5%​
96.8%​
Video Encoding
Xmpeg 5.03 with DivX 6.1
19.4​
12.2​
12​
62.8%​
Windows Media Encoder WMV9
61.6​
32.4​
32​
94.7%​
QuickTime v7.1H.264 (seconds)
0.0083​
223.2​
0.0044​
90.8%​
82.8%​
Audio Encoding
iTunes 6 MP3 (seconds)
0.0385​
48​
0.0204​
88.5%​
61.13​
33.47​
83.4%​
P4 to Conroe (Geomean)82.7%
Conroe to NahalemNahalemConroeWork
SYSmark 2007i7-965-3.2GHzQX9770-3.2GHzRate
SysMarks
238​
222​
7.2%​
E-Learning
208​
198​
5.1%​
Video Creation
277​
265​
4.5%​
Productivity
234​
224​
4.5%​
3D
239​
207​
15.5%​
7.3%​
3D Rendering
POV-Ray 3.7 beta 29
4202​
2641​
59.1%​
Cinebench R10 - 1CPU
4475​
3937​
13.7%​
Cinebench R10 - XCPU
18810​
14065​
33.7%​
3dsmax CPU Composite
17.6​
13.1​
34.4%​
35.2%​
Encoding
x264 HD pass 1
85.8​
73.2​
17.2%​
x264 HD pass 2
31.6​
21.3​
48.4%​
DivX (seconds)
0.0305​
0.0236​
29.3%​
Windows Media Encoder (seconds)
0.0417​
0.0345​
20.8%​
iTunes MP3 (seconds)
0.0397​
0.0379​
4.8%​
50.24​
41.80​
24.1%​
22.4%​
Conroe to Nahalem (Geomean)
20.2%​

Nahalem to Sandy Bridge3.43.33ClockWork
Sysmark 20072600Ki7-975NormalizedRate
Overall
274​
251​
245​
11.8%​
Photoshop CS4
Retouch Artists Benchmark (seconds)
0.0885​
14.4​
0.0714​
23.9%​
File Compression/Decompression
PAR2 Multithreaded (seconds)
0.0578​
23.7​
0.0431​
34.2%​
WinRAR 3.80 Compression (seconds)
0.0168​
74​
0.0138​
21.6%​
7-Zip Benchmark
19744​
20217​
20642​
-4.4%​
7-Zip Compression
4611​
4447​
4540​
1.6%​
13.2%​
3D Rendering
Blender 3D Character Render (seconds)
0.0249​
46.8​
0.0218​
14.3%​
POV-Ray 3.7
4875​
4379​
4471​
9.0%​
3dsmax 9
20.1​
17.7​
18​
11.2%​
Cinebench 10 1CPU
5991​
4651​
4749​
26.2%​
Cinebench XCPU
22875​
20407​
20836​
9.8%​
14.1%​
Video Encoding
Xmpeg+DivX Encode (seconds)
0.0344​
31.3​
0.0326​
5.3%​
Windows Media Encoder WMV9 (seconds)
0.0500​
23​
0.0444​
12.6%​
x264 HD 1st Pass
106.4​
91.7​
94​
13.6%​
x264 HD 2nd Pass
36.3​
33.1​
32​
12.0%​
10.9%​
Flash video/Excel
Visial Studio 2008 (minutes)
0.0538​
21​
0.0486​
10.6%​
Flash video - Sorenson Squeeze (seconds)
0.0138​
90.4​
0.0113​
21.8%​
Excel Math - Monte Carlo Sim (seconds)
0.0901​
12​
0.0851​
5.9%​
Excel Math Operations (seconds)
0.2941​
3.183​
0.3208​
-8.3%​
274.00​
245.14​
12.2%​
Nahalem to Sandy Bridge
11.8%​
Sandy Bridge to Ivy Bridge 3.93.8ClockWork
SYSmark 20123770K2600KNormalizedRate
SysMarks Overall
228​
212.0​
217.6​
4.8%​
Office Productivity
189​
176.0​
180.6​
4.6%​
Media Creation
218​
197.0​
202.2​
7.8%​
Productivity
235​
221.0​
226.8​
3.6%​
Data/Financial Analysis
277​
268.0​
275.1​
0.7%​
3D Modeling
260​
234.0​
240.2​
8.3%​
System Management
200​
187.0​
191.9​
4.2%​
4.9%​
SYSmark 2007
SysMarks Overall
303​
274.0​
281.2​
7.7%​
Productivity
276​
283.0​
290.4​
-5.0%​
E-Learning
308​
244.0​
250.4​
23.0%​
Video Creation
293​
255.0​
261.7​
12.0%​
3D
340​
318.0​
326.4​
4.2%​
8.4%​
3D Rendering
POV-Ray 3.7 beta 29
Cinebench R11.5 - 1CPU
1.66​
1.5​
1.6​
6.4%​
Cinebench R11.5 - XCPU
7.61​
6.9​
7.0​
8.1%​
3dsmax R9
21.8​
20.1​
20.6​
5.7%​
6.7%​
Encoding
x264 HD pass 1
104.2​
94.9​
97.4​
7.0%​
x264 HD pass 2
41​
36.0​
36.9​
11.0%​
9.0%​
Miscellaneous
Build Chromium Product Visual Studio (minutes)
0.0565​
18.6​
0.0552​
2.4%​
Photoshop CS4 - Retouch Artist (seconds)
0.0971​
11.3​
0.0908​
6.9%​
4.6%​
Compression and Encryption
7-zip - 32MB Dictionary
22810​
19744​
20263.6​
12.6%​
AES-128 - True Crypt 7.1
3.7​
3.4​
3.5​
6.0%​
58.14​
54.48​
6.9%​
Sandy Bridge to Ivy Bridge (Geomean)
6.7%​

Ivy Bridge to Haswell4770K3770KWork Rate
POV-Ray 3.7
1541.3​
1363.6​
13.0%​
Cinebench R11.5 - 1CPU
1.78​
1.7​
7.2%​
Cinebench R11.5 - XCPU
8.07​
7.6​
6.0%​
7-zip single thread
4807​
4716.0​
1.9%​
7-zip multithreaded
23101​
22810.0​
1.3%​
Kraken Java Script - Chrome (ms)
0.0008​
0.0008​
7.8%​
PCMark-7 Overall
6747​
6268.0​
7.6%​
x264 HD 1st Pass
79.1​
74.8​
5.7%​
2x64 HD 2nd Pass
16.5​
14.6​
13.0%​
TrueCrypt AES
4.4​
3.7​
18.9%​
Visual Studio 2012 - Build Firefox (minutes)
0.0498​
0.0433​
14.9%​
26.14​
24.04​
8.9%​
Ivy Bridge to Haswell
8.7%​
Haswell to Skylake3.43.33ClockWork
6700K4770KNormalizedRate
WinRAR 5.01 Compression (sec)
0.0204​
54.5​
0.0198​
3.3%​
7-Zip Compression
26370​
24100​
25954​
1.6%​
3D Particle Movement single thread
140.7​
129.37​
139​
1.0%​
3D Particle Movement multithread
803.68​
727.64​
784​
2.6%​
Cinebench 10 1CPU
9052​
7718​
8312​
8.9%​
Cinebench 10 XCPU
36747​
30095​
32410​
13.4%​
x264 HD 1st Pass
133.48​
112.43​
121​
10.2%​
x264 HD 2nd Pass
55.9​
46.7​
50​
11.2%​
Google Octane v2
45345​
32193​
34669​
30.8%​
WebXPRT
2949​
2594​
2794​
5.6%​
Dolphin Emulation (minutes)
0.1546​
7.63​
0.1411​
9.5%​
Fastone Image Viewer 4,9 (seconds)
0.0294​
40​
0.0269​
9.2%​
Sunspider (ms)
0.0079​
121​
0.0089​
-11.5%​
Mozilla Kraken 1.1 (ms)
0.0014​
1091​
0.0010​
37.8%​
31.59​
29.00​
9.5%​
Haswell to Skylake
8.9%​
Skylake to Sunny Cove3.94.2ClockWork
1065G78650uNormalizedRate
PCMark 10 - Essentials
9325​
8413​
7812​
19.4%​
PCMark 10 - Productivity
7008​
6480​
6017​
16.5%​
PCMark 10 - Digital Content Creation
3902​
3035​
2818​
38.5%​
PCMark 10 - Overall
4546​
3875​
3598​
26.3%​
Cinebench R15 single thread
181.14​
170​
158​
14.7%​
Cinebench R15 multithread
826.7​
658.84​
612​
35.1%​
x264 HD 1st Pass
73.72​
68.81​
64​
15.4%​
x264 HD 2nd Pass
14.37​
13.85​
13​
11.7%​
Google Octane v2
40002​
35532​
32994​
21.2%​
WebXPRT 3
223​
208​
193​
15.5%​
WebXPRT 2015
593​
557​
517​
14.7%​
Mozilla Kraken 1.1 (ms)
0.0010​
1123​
0.0008​
26.4%​
316.66​
261.68​
21.3%​
Skylake to Sunny Cove
21.0%​
Skylake to Sunny Cove
3.9​
4.6​
ClockWork
PCMag Review1065G78565uNormalizedRate
Cinebench R15
182​
177​
150​
21.3%​
Cinebench R20
454​
443​
375.587​
20.9%​
Handbrake 1.1.1
0.051​
0.050​
0.043​
19.1%​
POV-Ray 3.7 - 15W
0.004​
0.004​
0.004​
12.3%​
POV-Ray 3.7 - 25W
0.005​
0.005​
0.004​
25.4%​
Blender 2.77a (flying squirrel) 15W
0.031​
0.031​
0.026​
17.9%​
7-Zip (32MB Dictionary) 15W
30954​
28143​
23860​
29.7%​
Geomean
1.88​
1.56​
21.0%​
Skylake to Sunny Cove (Geomean)
20.8%​
 

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
I'm probably late on this but have you seen this Tom's Hardware Intel interview? I feel so bad for this Intel guy having to try and answer these questions in a positive light. He really takes some hard questions and there is so much repeating of himself and swallowing I find myself cringing through most of the interview. Considering what he has to work with he does a good job. Probably 40 minutes of his life he's rather forget as soon as possible.

The one thing I picked up was Rocket Lake will definitely be 32EU's from a graphics standpoint.

 
Last edited:
  • Like
Reactions: Elfear

LightningZ71

Senior member
Mar 10, 2017
658
583
106
Maybe it's die recovery for dies that have failed big core complexes? Maybe sold under the Pentium and Celeron line? Maybe it's an intentionally smaller die that just has small cores and no big cores at all?
 

DrMrLordX

Lifer
Apr 27, 2000
16,878
5,841
136
If Intel is selling a product with nothing but Gracemont cores, it would be under the Atom line and be the successor to Tremont (read: not an Alder Lake product).
 

mikk

Diamond Member
May 15, 2012
3,038
838
136
N line could a lower end lineup, because Atom chips were called Nxxxx. May be 1 Goldmont+4/8 Gracemont or something like this.
 

jpiniero

Diamond Member
Oct 1, 2010
8,851
1,617
126
If Intel is selling a product with nothing but Gracemont cores, it would be under the Atom line and be the successor to Tremont (read: not an Alder Lake product).
I think Lightning is right. It's still Alder Lake but the die has no Big Cores or the Big Cores are disabled.
 

podspi

Golden Member
Jan 11, 2011
1,945
41
91
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

Can you imagine a 4 or 8 core "Atom" processor outperforming not so old HEDT processors? So happy the CPU industry is showing some signs of life.
 

jpiniero

Diamond Member
Oct 1, 2010
8,851
1,617
126
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.
They don't really use the Atom brand now. Mostly Pentium Silver and Celeron. They do use Atom for some server focused parts. I'm sure the clock speeds on Gracemont will not be that high even on the Core branded products. Kind of wonder how unlocked will work with the small cores.
 

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

Can you imagine a 4 or 8 core "Atom" processor outperforming not so old HEDT processors? So happy the CPU industry is showing some signs of life.
Okay let's step back a minute and think about this. What's going on inside these processors if the little Gracemont cores are going to perform like Skylake and the big Golden Cove +20% on Sunny Cove?

Gracemont is the next iteration of Tremont, which itself is supposed to be 30% better than Goldmont plus. This would put it near Haswell level performance. I think Gracemont would need +15% on Tremont to be at Skylake level work rate.

As for the Golden Cove I'm thinking that after adding the additional 50% L1 data cache to the front end, most of the design efforts where then spent on the back end with the additional ports and other larger structures. The 20% uplift they got was huge, leading me to believe the back end was the bottleneck for Skylake and they opened it up to the point where the front end is now the bottleneck (in preparation for Golden Cove) so to see another 20% in performance they are going to have to add decoders/redesign the front end and tweak the back end as well to keep up with the front end.

Sunny has 300M transistors, I'd say Golden Cove will be 350-400M. I have no idea of the number of transistors for Tremont? I'm just wondering what difference in transistor count makes for Big/Little?
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,945
41
91
Okay let's step back a minute and think about this. What's going on inside these processors if the little Gracemont cores are going to perform like Skylake and the big Golden Cove +20% on Sunny Cove.

Gracemont is the next iteration of Tremont, which itself is supposed to be 30% better than Goldmont plus. This would put it near Haswell level performance. I think Gracemont would need +15% on Tremont to be at Skylake level work rate.
Well, Skylake is a pretty good design, right? Slim it down, add more advanced power management, clock it reasonably, introducing the new 'Atoms' @ 10nm.

I think having Big and Little cores allows a lot more flexibility in design. You can have a Skylake-lite design that still provides decent worst-case performance, while implementing power hungry structures in the big cores because you know you can always shut them down if needed for efficiency.

One significant difference between AMD and Intel has been the relative sizes of their FPUs. Intel has always been more aggressive, but now they can be moreso if they know it'll be gated most of the time (the Skylake system I have spends most of its time idling).

If they're smart, they'll optimize the little cores for web browsing. The big cores only kick in when you need sustained performance, like encoding, gaming, deep learning (supposedly on a CPU), etc.
 

gdansk

Senior member
Feb 8, 2011
590
252
136
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.
Server/industrial/embedded customers are the ones buying "Atom" branded chips these days. The consumer ones are branded Celeron/Pentium. Unfortunately the brand is a bit tarnished there too because of the LPC clock bug. But generally they are buying on spec sheets far more than names.
 

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
Well, Skylake is a pretty good design, right? Slim it down, add more advanced power management, clock it reasonably, introducing the new 'Atoms' @ 10nm.
Yes, I was thinking the same thing! But then the "mont" in the name makes me think we're talking about an upgraded Tremont core for Gracemont. As an amateur looking at the block diagrams it seems that one of the big differences between Core and Atom is the fact that Atom does not include any complex decoders, which I know are power hungry beasties. Swap out 3 of Tremonts simple decoders for a complex one and you have a front end that *looks* a lot like Haswell. That could be around 200M transistors per core. About half of what I projected for Golden Cove in my post above. Translating that into area means that the Cove big cores would be about 42% larger than the Mont small cores in Alder Lake.

Let's call it 40%. That would be not only the increase in transistors from monts to coves but also the approximate performance increase assuming monts perform like Skylakes.

Meaning per unit area all compute parts perform the same in Alder Lake at max compute level. The difference being in where each part is most efficient in it's compute output vs power input curve, which I think is Intel's rational for Alder Lake in the first place.
 

RTX

Junior Member
Nov 5, 2020
17
2
16
Does the IPC change significantly with varying amounts of L2 from E8400 vs E6850 vs E5700? 6MB vs 4MB vs 2MB. All 3 are 3ghz, 800mhz FSB

Pentium D 925 vs Pentium D 830? 4MB vs 2MB. 3ghz, 800mhz FSB
 

Cardyak

Member
Sep 12, 2018
43
64
61
Does the IPC change significantly with varying amounts of L2 from E8400 vs E6850 vs E5700? 6MB vs 4MB vs 2MB. All 3 are 3ghz, 800mhz FSB

Pentium D 925 vs Pentium D 830? 4MB vs 2MB. 3ghz, 800mhz FSB
Yes it does, a larger cache means fewer trips to memory. As always the performance increase varies depending on the workload. If the program is particularly memory sensitive then the IPC may increase by over 10%, whereas code that contains less memory operations will witness a much smaller increase (if any at all)

On average from looking at various benchmarks and calculating comparisons, I’d estimate that doubling the size of a cache (regardless of whether it’s L1, L2, or L3) normally returns a 4% - 5% increase in IPC.
 

moonbogg

Diamond Member
Jan 8, 2011
9,929
1,608
126
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.
 
  • Like
Reactions: Tlh97 and Ajay

Ajay

Diamond Member
Jan 8, 2001
8,158
3,103
136
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.
I have the same thought. Sort of wondered if it started out as a laptop CPU architecture.
 
Last edited:
  • Like
Reactions: Tlh97 and moonbogg

Bouowmx

Golden Member
Nov 13, 2016
1,094
485
146
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.
It's mobile-oriented
Small cores for more multi-threaded performance
In Lakefield, 4x Tremont = 1x Sunny Cove in area
Assuming Golden Cove/Gracemont are in the same ratio, and Golden Cove is 1.4x IPC of Gracemont/Skylake:
1x Golden Cove: 1 * 1.4 * 4.8 GHz * 1.25 (HT) = 8.4 cow2beef
4x Gracemont: 4 * 1.0 * 3.3 GHz = 13.2 cow2beef
 
  • Like
Reactions: Tlh97 and moonbogg

Hulk

Platinum Member
Oct 9, 1999
2,865
251
126
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.
I have been thinking about this as well. Here are two theories. I'm not implying any of these theories are legit, just topics to perhaps start discussion of this topic. All speculation based on what I've learned so far.

Theory #1
Intel is under the assumption that anything over the top of the line Alder Lake part is super HEDT and consists of a very small niche market that can/will be served by another line of their processors. 8 cores is probably "enough" for 99% of the computer world. So Alder Lake is built primarily for mobile and to cover *most* desktop users. If one Gracemont core can run the whole shabang while watching video or surfing the web ( a Skylake core could do this) wouldn't that result in some crazy impressive battery life for Intel to advertise? All the while the Golden Cove snooze away.

Theory #2
Perhaps some applications have interdependencies that can be well served by the Big/Little strategy. For example, a certain application is running 16 threads, let's just consider physical processors. Perhaps only 6 or 7 of these threads are really compute heavy, but there are 6 more that could cause the Big cores to inefficiently switch among threads and the application would run faster if the Big cores handle the compute heavy threads and a bunch of Little cores run the lighter load threads.

If the little cores are Skylake level compute we're not talking about old school Atom weaklings. Skylake level for the Little cores would be pretty impressive. Think about an 8 core Ice Lake running at 5GHz AND and 8 core non HT Skylake running nearly as fast. That would be a very potent combination when it's essentially a 10700 plus 8 Golden Cove cores. And if what I'm thinking above about light/heavy thread loads is true it could perform equal to or better than the 5950X since the Big cores would be faster than Zen 3.

Theory #3 (okay I'm really reaching on this one)
Some combination of theories #1 and #2 coupled with the fact that some clever juxtaposition of the Big/Little cores allows for superior heat transfer than all big cores. Could the die be laid out Big/Little/Big Little, etc... so that at full bore the little cores (running slower) could absorb and dissipate some of the heat from the big ones. Basically I'm talking about arranging the die to avoid hot spot.
 
  • Like
Reactions: Tlh97 and moonbogg

KompuKare

Senior member
Jul 28, 2009
651
174
116
Or theory #4: Intel want to keep kernel developers on their toes!
Nevermind the scheduler changes Zen and the NUMA layout, moving and scheduling across totally different cores is going to be way harder.
Yes, ARM's Big.little had pioneered a lot of this but didn't ARM always maintain the same ISA and instruction sets between the cores used for Big.little?
While unsupported instruction could cause and exception and either be emulated or forced into the other type of core, this sounds like a lot more work.
Plus, whatever happened to AVX512 taking over the world?
 

ASK THE COMMUNITY