Discussion Intel current and future Lakes & Rapids thread

firewolfsm · Feb 16, 2021

Does anyone have an idea about what percentage of multicore workloads will scale properly with assymetrical cores? I know some programs divide a workload evenly between cores but need to wait for all threads to finish before moving forward. Might this be a common problem for Alderlake?

SAAA · Feb 16, 2021

firewolfsm said:
Does anyone have an idea about what percentage of multicore workloads will scale properly with assymetrical cores? I know some programs divide a workload evenly between cores but need to wait for all threads to finish before moving forward. Might this be a common problem for Alderlake?

I think that should be solved on the OS front, just like multicore CPUs, SMT or even Bulldozer approach with modules (that were loaded in a poor order till they threated them as 1-2-3-4 cores + SMT).

A purely parallel load? Well, the small cores won't be that much slower in the first place, at worst I believe half the speed:
Goldmont 1.5 IPC x 5GHz = 7.5 units of work
Gracemont 1.0 IPC x 3.5 to 4 GHz = 3.5 to 4 units of work

Making asymmetrical loads scale well might be hard but if every smartphone can do this (even on 3 different clusters) I don't see Microsoft and Intel not pulling it off.
A good idea would be to run something like OS and other apps on the small cores constantly (think anti-virus, mail or background apps), then every other program will see only big cores with SMT threads.
Ideally when the load is light enough the big cores could be turned off and the workload go into the small cores clusters only, just as today cores throttle to idle clock speeds when they are not doing much.

HurleyBird · Feb 16, 2021

Borealis7 said:
actually, you're not comparing IPC at all.

At this point, I think we all just need to come to terms with the fact that IPC has become synonymous with performance per clock and no amount of "well, actually" can change that. As long as people realize that they aren't quite using the term in the most literal sense, it's fine.

Hulk · Feb 16, 2021

HurleyBird said:
At this point, I think we all just need to come to terms with the fact that IPC has become synonymous with performance per clock and no amount of "well, actually" can change that. As long as people realize that they aren't quite using the term in the most literal sense, it's fine.

Absolutely great point. We have to respect terminology so I think finding another term for what we are measuring as IPC for discussion purposes is appropriate. Modern processors have so many features that tend to defy the traditional IPC definition such as hyperthreading (SMT) or even certain new ISA instructions only suited to certain programs that when we talk about IPC when it comes to the point of where the wheels meet the road we are really talking about something like "clock efficiency."

I'll put this definition out there for comment as a real world of something like IPC that we can actually measure and discuss to know that we are talking about the same thing.

Clock Efficiency or maybe processor efficiency? I don't know though as that brings power to mind? Anyway it would be... At a particular frequency, the rate of output for a specific input. Example, fix the clock, input benchmark, measure the time for output and you have the rate. The time to completion can be used as the performance efficiency metric in this case.

Furthermore, if benchmarks could be run while measuring the "average effective clock" using HWinfo, as we are doing in the Handbrake bench in this forum, we could get a pretty valid result for what I'm calling clock efficiency. This is going to be more important moving forward with clocks jumping all over the place. HWinfo polls the clock on each processor constantly at a very fine grained level and then integrates to arrive at the average effective clock for the entire processor cluster on the CPU.

The main reason I analyzed the Anandtech data is not only for my curiosity but so we could discuss performance from the same set of results when talking about the generations.

I generally shy away from putting out controversial posts like that for fear of doing a lot of work and then being berated for it. You know the saying "no good deed goes unpunished." But I've been around here a long time and I know most of the people are like me in that we really enjoy following the processor industry on all fronts. Add to that the fact that I'm always willing to learn and I try to take posts directed at me in the best light not the worst. When someone posted "you should use Geomean." At first I was like all that work and I should have used geomean? Then after thinking about it I was like "yeah, that's an honest and positive contribution to the discussion, I can do that."

SAAA said:
I think that should be solved on the OS front, just like multicore CPUs, SMT or even Bulldozer approach with modules (that were loaded in a poor order till they threated them as 1-2-3-4 cores + SMT).

A purely parallel load? Well, the small cores won't be that much slower in the first place, at worst I believe half the speed:
Goldmont 1.5 IPC x 5GHz = 7.5 units of work
Gracemont 1.0 IPC x 3.5 to 4 GHz = 3.5 to 4 units of work

Making asymmetrical loads scale well might be hard but if every smartphone can do this (even on 3 different clusters) I don't see Microsoft and Intel not pulling it off.
A good idea would be to run something like OS and other apps on the small cores constantly (think anti-virus, mail or background apps), then every other program will see only big cores with SMT threads.
Ideally when the load is light enough the big cores could be turned off and the workload go into the small cores clusters only, just as today cores throttle to idle clock speeds when they are not doing much.

Good points here. The units of work for Goldmont and Gracemont... did you pull them from the video of that guy on the previous page? I'm not critiquing, just wondering. His analysis (guesses) did seem reasonable.

Also, do get into even finer grained "thread proportioning" of work, I wonder if the scheduler mechanism can begin to take thermals into account? For example, you have 8 physical cores available. Within thermal parameters you can run 4 at full speed and 4 at quarter speed, or 2 at full, 2 at 3/4, and 4 at half, etc... A really smart scheduler would/could control the frequency of various threads to maximize overall application execution speeds. Adding in big/little cores just adds more variables into this "thread optimization" formula.

If you have an app coded such that it need one or two really screaming fast threads then it could crank them up at the expense of the other.

This would be an on-the-fly learning process for the OS and ideally it would involve AI so that it could "remember" previous runs, or better yet have access to an online database of similar data.

Hulk · Feb 16, 2021

Quick update - I'm going to call this "work rate."

I was very skeptical of the Skylake vs Sunny Cove results due to the thermal issues so I added an analysis using PCmag data. The geomean total I calculated with the Anandtech data was 20%, with the PCmag data 20.8%. I'm starting to become a believer that Sunny Cove is the real deal and RKL will actually show a ~20% work rate uplift over Skylake at the same frequencies. That is a quite significant real world performance uptick, especially if Intel can maintain high all core frequencies, which is most likely on the desktop considering the power/cooling available.

Notes:
This is not IPC as my colleagues have pointed out but a measure of the speed a processor computes a workload as compared to another at the same frequency.

Only CPU's from within the same testing review are compared.

I tried to maintain 3 significant digits because that is is how most results were reported. That being said precision and accuracy are unknown with these benchmarks.

If Ivy Bridge results seems higher than expected, keep in mind Ivy Bridge was a "Tick+" and had as Anand wrote "a more aggressive turbo" than Sandy Bridge. This could skew results.

Here are the Ivy Bridge core changes:
- Data structures previously statically shared between threads can now be dynamically shared (e.g. DSB queue), improves single threaded performance
- FP/integer divider delivers 2x throughput compared to Sandy Bridge
- MOV instructions no longer occupy an execution port, potential for improved ILP when MOVs are present

Intel Generational "Work Rate" Comparison Results	Geomean	Average
P4 to Conroe	82.7%	83.4%
Conroe to Nahalem	20.2%	22.4%
Nahalem to Sandy Bridge	11.8%	12.2%
Sandy Bridge to Ivy Bridge	6.7%	6.9%
Ivy Bridge to Haswell	8.7%	8.9%
Haswell to Skylake	8.9%	9.5%
Skylake to Sunny Cove	21.0%	21.3%

P4 to Conroe	2.93	3	Clock	Work
Sysmark 2004	X6800	Pentium D 930	Normalized	Rate
Overall	371	211	206	80.0%
Internet Content Creation	482	256	250	92.8%
Office Productivity	285	174	170	67.7%
3D Content Creation	447	232	227	97.3%
2D Content Creation	568	302	295	92.6%
Web Publication	442	240	234	88.6%
Communication And Networking	202	142	139	45.7%
Document Creation	380	206	201	88.9%
Data Analysis	302	180	176	71.8%

PC WorldBench 5
Overall WorldBench Score	156	99	97	61.3%

3D Rendering
2dsmax7	4.11	2.13	2	97.6%
Cinebench 1CPU	486	256	250	94.4%
Cinebench XCPU	892	460	449	98.5%
				96.8%
Video Encoding
Xmpeg 5.03 with DivX 6.1	19.4	12.2	12	62.8%
Windows Media Encoder WMV9	61.6	32.4	32	94.7%
QuickTime v7.1H.264 (seconds)	0.0083	223.2	0.0044	90.8%
				82.8%
Audio Encoding
iTunes 6 MP3 (seconds)	0.0385	48	0.0204	88.5%
	61.13		33.47	83.4%
		P4 to Conroe (Geomean)	82.7%

Conroe to Nahalem	Nahalem	Conroe	Work
SYSmark 2007	i7-965-3.2GHz	QX9770-3.2GHz	Rate
SysMarks	238	222	7.2%
E-Learning	208	198	5.1%
Video Creation	277	265	4.5%
Productivity	234	224	4.5%
3D	239	207	15.5%
			7.3%
3D Rendering
POV-Ray 3.7 beta 29	4202	2641	59.1%
Cinebench R10 - 1CPU	4475	3937	13.7%
Cinebench R10 - XCPU	18810	14065	33.7%
3dsmax CPU Composite	17.6	13.1	34.4%
			35.2%
Encoding
x264 HD pass 1	85.8	73.2	17.2%
x264 HD pass 2	31.6	21.3	48.4%
DivX (seconds)	0.0305	0.0236	29.3%
Windows Media Encoder (seconds)	0.0417	0.0345	20.8%
iTunes MP3 (seconds)	0.0397	0.0379	4.8%
	50.24	41.80	24.1%
			22.4%
	Conroe to Nahalem (Geomean)	20.2%

Nahalem to Sandy Bridge	3.4	3.33	Clock	Work
Sysmark 2007	2600K	i7-975	Normalized	Rate
Overall	274	251	245	11.8%
Photoshop CS4
Retouch Artists Benchmark (seconds)	0.0885	14.4	0.0714	23.9%

File Compression/Decompression
PAR2 Multithreaded (seconds)	0.0578	23.7	0.0431	34.2%
WinRAR 3.80 Compression (seconds)	0.0168	74	0.0138	21.6%
7-Zip Benchmark	19744	20217	20642	-4.4%
7-Zip Compression	4611	4447	4540	1.6%
				13.2%
3D Rendering
Blender 3D Character Render (seconds)	0.0249	46.8	0.0218	14.3%
POV-Ray 3.7	4875	4379	4471	9.0%
3dsmax 9	20.1	17.7	18	11.2%
Cinebench 10 1CPU	5991	4651	4749	26.2%
Cinebench XCPU	22875	20407	20836	9.8%
				14.1%
Video Encoding
Xmpeg+DivX Encode (seconds)	0.0344	31.3	0.0326	5.3%
Windows Media Encoder WMV9 (seconds)	0.0500	23	0.0444	12.6%
x264 HD 1st Pass	106.4	91.7	94	13.6%
x264 HD 2nd Pass	36.3	33.1	32	12.0%
				10.9%
Flash video/Excel
Visial Studio 2008 (minutes)	0.0538	21	0.0486	10.6%
Flash video - Sorenson Squeeze (seconds)	0.0138	90.4	0.0113	21.8%
Excel Math - Monte Carlo Sim (seconds)	0.0901	12	0.0851	5.9%
Excel Math Operations (seconds)	0.2941	3.183	0.3208	-8.3%
	274.00		245.14	12.2%
		Nahalem to Sandy Bridge	11.8%

Sandy Bridge to Ivy Bridge	3.9	3.8	Clock	Work
SYSmark 2012	3770K	2600K	Normalized	Rate
SysMarks Overall	228	212.0	217.6	4.8%
Office Productivity	189	176.0	180.6	4.6%
Media Creation	218	197.0	202.2	7.8%
Productivity	235	221.0	226.8	3.6%
Data/Financial Analysis	277	268.0	275.1	0.7%
3D Modeling	260	234.0	240.2	8.3%
System Management	200	187.0	191.9	4.2%
				4.9%
SYSmark 2007
SysMarks Overall	303	274.0	281.2	7.7%
Productivity	276	283.0	290.4	-5.0%
E-Learning	308	244.0	250.4	23.0%
Video Creation	293	255.0	261.7	12.0%
3D	340	318.0	326.4	4.2%
				8.4%
3D Rendering
POV-Ray 3.7 beta 29
Cinebench R11.5 - 1CPU	1.66	1.5	1.6	6.4%
Cinebench R11.5 - XCPU	7.61	6.9	7.0	8.1%
3dsmax R9	21.8	20.1	20.6	5.7%
				6.7%
Encoding
x264 HD pass 1	104.2	94.9	97.4	7.0%
x264 HD pass 2	41	36.0	36.9	11.0%
				9.0%
Miscellaneous
Build Chromium Product Visual Studio (minutes)	0.0565	18.6	0.0552	2.4%
Photoshop CS4 - Retouch Artist (seconds)	0.0971	11.3	0.0908	6.9%
				4.6%
Compression and Encryption
7-zip - 32MB Dictionary	22810	19744	20263.6	12.6%
AES-128 - True Crypt 7.1	3.7	3.4	3.5	6.0%
	58.14		54.48	6.9%
	Sandy Bridge to Ivy Bridge (Geomean)	6.7%

Ivy Bridge to Haswell	4770K	3770K	Work Rate
POV-Ray 3.7	1541.3	1363.6	13.0%
Cinebench R11.5 - 1CPU	1.78	1.7	7.2%
Cinebench R11.5 - XCPU	8.07	7.6	6.0%
7-zip single thread	4807	4716.0	1.9%
7-zip multithreaded	23101	22810.0	1.3%
Kraken Java Script - Chrome (ms)	0.0008	0.0008	7.8%
PCMark-7 Overall	6747	6268.0	7.6%
x264 HD 1st Pass	79.1	74.8	5.7%
2x64 HD 2nd Pass	16.5	14.6	13.0%
TrueCrypt AES	4.4	3.7	18.9%
Visual Studio 2012 - Build Firefox (minutes)	0.0498	0.0433	14.9%
	26.14	24.04	8.9%
	Ivy Bridge to Haswell	8.7%


Haswell to Skylake	3.4	3.33	Clock	Work
	6700K	4770K	Normalized	Rate
WinRAR 5.01 Compression (sec)	0.0204	54.5	0.0198	3.3%
7-Zip Compression	26370	24100	25954	1.6%
3D Particle Movement single thread	140.7	129.37	139	1.0%
3D Particle Movement multithread	803.68	727.64	784	2.6%
Cinebench 10 1CPU	9052	7718	8312	8.9%
Cinebench 10 XCPU	36747	30095	32410	13.4%
x264 HD 1st Pass	133.48	112.43	121	10.2%
x264 HD 2nd Pass	55.9	46.7	50	11.2%
Google Octane v2	45345	32193	34669	30.8%
WebXPRT	2949	2594	2794	5.6%
Dolphin Emulation (minutes)	0.1546	7.63	0.1411	9.5%
Fastone Image Viewer 4,9 (seconds)	0.0294	40	0.0269	9.2%
Sunspider (ms)	0.0079	121	0.0089	-11.5%
Mozilla Kraken 1.1 (ms)	0.0014	1091	0.0010	37.8%
	31.59		29.00	9.5%
		Haswell to Skylake	8.9%



Skylake to Sunny Cove	3.9	4.2	Clock	Work
	1065G7	8650u	Normalized	Rate
PCMark 10 - Essentials	9325	8413	7812	19.4%
PCMark 10 - Productivity	7008	6480	6017	16.5%
PCMark 10 - Digital Content Creation	3902	3035	2818	38.5%
PCMark 10 - Overall	4546	3875	3598	26.3%
Cinebench R15 single thread	181.14	170	158	14.7%
Cinebench R15 multithread	826.7	658.84	612	35.1%
x264 HD 1st Pass	73.72	68.81	64	15.4%
x264 HD 2nd Pass	14.37	13.85	13	11.7%
Google Octane v2	40002	35532	32994	21.2%
WebXPRT 3	223	208	193	15.5%
WebXPRT 2015	593	557	517	14.7%
Mozilla Kraken 1.1 (ms)	0.0010	1123	0.0008	26.4%
	316.66		261.68	21.3%
	Skylake to Sunny Cove	21.0%

Skylake to Sunny Cove	3.9	4.6	Clock	Work
PCMag Review	1065G7	8565u	Normalized	Rate
Cinebench R15	182	177	150	21.3%
Cinebench R20	454	443	375.587	20.9%
Handbrake 1.1.1	0.051	0.050	0.043	19.1%
POV-Ray 3.7 - 15W	0.004	0.004	0.004	12.3%
POV-Ray 3.7 - 25W	0.005	0.005	0.004	25.4%
Blender 2.77a (flying squirrel) 15W	0.031	0.031	0.026	17.9%
7-Zip (32MB Dictionary) 15W	30954	28143	23860	29.7%
Geomean	1.88		1.56	21.0%
	Skylake to Sunny Cove (Geomean)	20.8%

Hulk · Feb 16, 2021

I'm probably late on this but have you seen this Tom's Hardware Intel interview? I feel so bad for this Intel guy having to try and answer these questions in a positive light. He really takes some hard questions and there is so much repeating of himself and swallowing I find myself cringing through most of the interview. Considering what he has to work with he does a good job. Probably 40 minutes of his life he's rather forget as soon as possible.

The one thing I picked up was Rocket Lake will definitely be 32EU's from a graphics standpoint.

Intel Teases Rocket Lake Core i9-11900K, Intends to Retake Gaming Crown With 19% IPC Increase

Once more unto the breach

www.tomshardware.com

jpiniero · Feb 18, 2021

Search - Intel.com

Intel Search Results Page

www.intel.com

Now there's an Alder Lake N to go along with S/P/M. Wonder what it could be.

LightningZ71 · Feb 18, 2021

Maybe it's die recovery for dies that have failed big core complexes? Maybe sold under the Pentium and Celeron line? Maybe it's an intentionally smaller die that just has small cores and no big cores at all?

DrMrLordX · Feb 19, 2021

If Intel is selling a product with nothing but Gracemont cores, it would be under the Atom line and be the successor to Tremont (read: not an Alder Lake product).

mikk · Feb 19, 2021

N line could a lower end lineup, because Atom chips were called Nxxxx. May be 1 Goldmont+4/8 Gracemont or something like this.

DrMrLordX · Feb 19, 2021

mikk said:
N line could a lower end lineup, because Atom chips were called Nxxxx.

Some of them are/were. In the Goldmont family, N chips were dual cores while J chips were quads.

jpiniero · Feb 19, 2021

DrMrLordX said:
If Intel is selling a product with nothing but Gracemont cores, it would be under the Atom line and be the successor to Tremont (read: not an Alder Lake product).

I think Lightning is right. It's still Alder Lake but the die has no Big Cores or the Big Cores are disabled.

podspi · Feb 19, 2021

If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

Can you imagine a 4 or 8 core "Atom" processor outperforming not so old HEDT processors? So happy the CPU industry is showing some signs of life.

jpiniero · Feb 19, 2021

podspi said:
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

They don't really use the Atom brand now. Mostly Pentium Silver and Celeron. They do use Atom for some server focused parts. I'm sure the clock speeds on Gracemont will not be that high even on the Core branded products. Kind of wonder how unlocked will work with the small cores.

Hulk · Feb 19, 2021

podspi said:
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

Can you imagine a 4 or 8 core "Atom" processor outperforming not so old HEDT processors? So happy the CPU industry is showing some signs of life.

Okay let's step back a minute and think about this. What's going on inside these processors if the little Gracemont cores are going to perform like Skylake and the big Golden Cove +20% on Sunny Cove?

Gracemont is the next iteration of Tremont, which itself is supposed to be 30% better than Goldmont plus. This would put it near Haswell level performance. I think Gracemont would need +15% on Tremont to be at Skylake level work rate.

As for the Golden Cove I'm thinking that after adding the additional 50% L1 data cache to the front end, most of the design efforts where then spent on the back end with the additional ports and other larger structures. The 20% uplift they got was huge, leading me to believe the back end was the bottleneck for Skylake and they opened it up to the point where the front end is now the bottleneck (in preparation for Golden Cove) so to see another 20% in performance they are going to have to add decoders/redesign the front end and tweak the back end as well to keep up with the front end.

Sunny has 300M transistors, I'd say Golden Cove will be 350-400M. I have no idea of the number of transistors for Tremont? I'm just wondering what difference in transistor count makes for Big/Little?

podspi · Feb 19, 2021

Hulk said:
Okay let's step back a minute and think about this. What's going on inside these processors if the little Gracemont cores are going to perform like Skylake and the big Golden Cove +20% on Sunny Cove.

Gracemont is the next iteration of Tremont, which itself is supposed to be 30% better than Goldmont plus. This would put it near Haswell level performance. I think Gracemont would need +15% on Tremont to be at Skylake level work rate.

Well, Skylake is a pretty good design, right? Slim it down, add more advanced power management, clock it reasonably, introducing the new 'Atoms' @ 10nm.

I think having Big and Little cores allows a lot more flexibility in design. You can have a Skylake-lite design that still provides decent worst-case performance, while implementing power hungry structures in the big cores because you know you can always shut them down if needed for efficiency.

One significant difference between AMD and Intel has been the relative sizes of their FPUs. Intel has always been more aggressive, but now they can be moreso if they know it'll be gated most of the time (the Skylake system I have spends most of its time idling).

If they're smart, they'll optimize the little cores for web browsing. The big cores only kick in when you need sustained performance, like encoding, gaming, deep learning (supposedly on a CPU), etc.

gdansk · Feb 19, 2021

podspi said:
If the new Atom cores have the same approximate performance (scaled for frequency), do they even need the Atom brand anymore? It's kinda like Internet Explorer, nobody hears "Atom" and wants it.

Server/industrial/embedded customers are the ones buying "Atom" branded chips these days. The consumer ones are branded Celeron/Pentium. Unfortunately the brand is a bit tarnished there too because of the LPC clock bug. But generally they are buying on spec sheets far more than names.

Hulk · Feb 19, 2021

podspi said:
Well, Skylake is a pretty good design, right? Slim it down, add more advanced power management, clock it reasonably, introducing the new 'Atoms' @ 10nm.

Yes, I was thinking the same thing! But then the "mont" in the name makes me think we're talking about an upgraded Tremont core for Gracemont. As an amateur looking at the block diagrams it seems that one of the big differences between Core and Atom is the fact that Atom does not include any complex decoders, which I know are power hungry beasties. Swap out 3 of Tremonts simple decoders for a complex one and you have a front end that *looks* a lot like Haswell. That could be around 200M transistors per core. About half of what I projected for Golden Cove in my post above. Translating that into area means that the Cove big cores would be about 42% larger than the Mont small cores in Alder Lake.

Let's call it 40%. That would be not only the increase in transistors from monts to coves but also the approximate performance increase assuming monts perform like Skylakes.

Meaning per unit area all compute parts perform the same in Alder Lake at max compute level. The difference being in where each part is most efficient in it's compute output vs power input curve, which I think is Intel's rational for Alder Lake in the first place.

mikk · Feb 19, 2021

DrMrLordX said:
Some of them are/were. In the Goldmont family, N chips were dual cores while J chips were quads.

Dual and Quadcore. J chips are desktop, N mobile.

Intel® Celeron® Processor N Series Product Specifications

Intel® Celeron® Processor N Series product listing with links to detailed product features and specifications.

ark.intel.com

DrMrLordX · Feb 19, 2021

mikk said:
Dual and Quadcore. J chips are desktop, N mobile.

Hmm, I never saw any N quads before. Thanks for the clarification.

RTX · Feb 20, 2021

Does the IPC change significantly with varying amounts of L2 from E8400 vs E6850 vs E5700? 6MB vs 4MB vs 2MB. All 3 are 3ghz, 800mhz FSB

Pentium D 925 vs Pentium D 830? 4MB vs 2MB. 3ghz, 800mhz FSB

Cardyak · Feb 20, 2021

RTX said:
Does the IPC change significantly with varying amounts of L2 from E8400 vs E6850 vs E5700? 6MB vs 4MB vs 2MB. All 3 are 3ghz, 800mhz FSB

Pentium D 925 vs Pentium D 830? 4MB vs 2MB. 3ghz, 800mhz FSB

Yes it does, a larger cache means fewer trips to memory. As always the performance increase varies depending on the workload. If the program is particularly memory sensitive then the IPC may increase by over 10%, whereas code that contains less memory operations will witness a much smaller increase (if any at all)

On average from looking at various benchmarks and calculating comparisons, I’d estimate that doubling the size of a cache (regardless of whether it’s L1, L2, or L3) normally returns a 4% - 5% increase in IPC.

moonbogg · Feb 20, 2021

What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.

Ajay · Feb 20, 2021

moonbogg said:
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.

I have the same thought. Sort of wondered if it started out as a laptop CPU architecture.

Bouowmx · Feb 20, 2021

moonbogg said:
What are the little cores going to be used for on the desktop? I can't imagine what benefit adding 4 little cores over 2 more legit ones could bring. This sounds like a laptop design to me.

It's mobile-oriented
Small cores for more multi-threaded performance
In Lakefield, 4x Tremont = 1x Sunny Cove in area
Assuming Golden Cove/Gracemont are in the same ratio, and Golden Cove is 1.4x IPC of Gracemont/Skylake:
1x Golden Cove: 1 * 1.4 * 4.8 GHz * 1.25 (HT) = 8.4 cow2beef
4x Gracemont: 4 * 1.0 * 3.3 GHz = 13.2 cow2beef

Discussion Intel current and future Lakes & Rapids thread

Golden Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Lifer

Diamond Member

Lifer

Lifer

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Member

Member

Lifer

Lifer

Golden Member