Question Raptor Lake - Official Thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,472
2,435
136
Since we already have the first Raptor Lake leak I'm thinking it should have it's own thread.
What do we know so far?
From Anandtech's Intel Process Roadmap articles from July:

Built on Intel 7 with upgraded FinFET
10-15% PPW (performance-per-watt)
Last non-tiled consumer CPU as Meteor Lake will be tiled

I'm guessing this will be a minor update to ADL with just a few microarchitecture changes to the cores. The larger change will be the new process refinement allowing 8+16 at the top of the stack.

Will it work with current z690 motherboards? If yes then that could be a major selling point for people to move to ADL rather than wait.
 
  • Like
Reactions: vstar

Doug S

Platinum Member
Feb 8, 2020
2,751
4,685
136
I have been trying to tell everyone this for quite a while, not just about Intel “hybrid”, but about these big/medium/little designs, period. Not a single company, including Apple, uses these types of designs for power savings. It all comes down to cost and nothing else. Of course everyone here continues to be in denial…


It sure takes a lot of 'denial' about how Apple operates to believe that their little cores have anything to do with cost savings. Apple has never been too worried about using up die area to improve power efficiency, otherwise they would have not devote so much area to caches.

Now maybe Intel is tilting more towards multithreaded performance, but if so they are in denial about how the typical person uses their PC. A pretty small percentage will ever max out all their cores at 100%, so other than benchmark bragging such designs do little to improve things for the typical PC customer. They can make use of better battery life if they have small cores that are 2-3x more power efficient like Apple's though (at least until battery life exceeds a day, then most people simply don't care about further improvement)

If Apple was worried about conserving die area they would have placed more than two of their little cores on the M1 Pro / M1 Max Pro. Given they went from 2+4 on the iPhone to 8+2 on the M1P/M1M, they clearly know where the strengths of each lie. No one worried about "cost" would design a 400++ mm^2 die for a laptop with 512 bit wide LPDDR5. But that sure does wonders for saving power compared to using DIMMs and GDDR6!
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Let's go down the rabbit hole a bit to try and figure out what the hell Intel is up to with Gracemont. They told us it was for the best multithreading performance in the least area. With that being a starting point let's not consider ST performance, that is what Golden Cove is for.

After much testing I have found that in Cinebench R23 1 Golden Cove w/HT earns 506 CB R23 points for every GHz of frequency. 1 Gracemont earns 242 CB R23 points.

Let me try to do this better, CBR23 is the absolute best case scenario for testing MT performance

At ISO speed 3.9 GHZ

8 Core Gracemont cluster gets 10336

8 Core Golden Cove cluster gets 12029 without HT, add 30% on top of that with HT/SMT = 15630.

Screenshot_20220206-123743_Chrome.jpg


8 Core Golden Cove die area is 84.23 mm2
8 Core Gracemont die area is 26.34 mm2

Gracemont cores get 393.5 points per mm2
Golden Cove cores get 185.5 points per mm2

Even at 5 GHz the 8 Core Golden Cove cores with HT enabled get 238.7 points per mm2. Gracemont cores cant be beat in performance per area even the mighty M1 falls short
1644175724207.png
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,508
3,011
136
Let me try to do this better, CBR23 is the absolute best case scenario for testing MT performance

At ISO speed 3.9 GHZ

8 Core Gracemont cluster gets 10336

8 Core Golden Cove cluster gets 12029 without HT, add 30% on top of that with HT/SMT = 15630.

View attachment 57048


8 Core Golden Cove die area is 84.23 mm2
8 Core Gracemont die area is 26.34 mm2

Gracemont cores get 393.5 points per mm2
Golden Cove cores get 185.5 points per mm2
Why do you limit Golden Cove to 3.9GHz when you are calculating performance/mm2?
This doesn't make much sense to me.

edit: Much better now that you included Golden Coves at 5GHz.

BTW I have to wonder how accurate 10366 points for 8 Gracemont cores in CB R23 really is. Enabling Gracemont cores increases performance by only 7675 points.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,472
2,435
136
Just edited the post with 5 GHZ speed, it does not change the fact that Gracemont cant be beat in performance per Area mm2

Yes, I agree as I posted with support.

Gracemont is up to 53% faster MT in terms of area over Golden Cove. This is right in line with Intel's slide which reads ">50%."

At equal clocks Gracemont is up to 16% faster MT in terms of area over Golden Cove.

If all software was perfectly MT optimized CPU would be all small cores. But it is not and so they are not.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Even the Apple Firestorm(Performance Core) Falls short of Intel Gracemont Core in Performance per area


Apple Firestorm core with L2$ surface area is 3.76 mm2, gets 1536 points in Cinebench R 23 so it gets 408.5 points per mm2

Intel Gracemont core with L2$ surface area is 2.2 mm2 and gets 1292 points in Cinebench R 23 so it gets 587 points per area
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Really? You think a year's worth of changes are worth spending your hard earned money for it? Go ahead, it's your money but if you want to argue it has to do with things like "self control" or that's a "need" you totally lost me, because it's neither of those at all, except for the few that explicitely need the performance. And when I say need I mean work not gaming. And you can game with LOT less hardware too.

Also to that guy saying Intel had to laser off L2 cache on GRT cluster for Alderlake is being ridiculous. So you are saying they have 4MB cache dies, but they are saving it for next year! Hahah!
I have a 5950X. I will upgrade to Zen 4 when it comes out. Why? My PC makes me money. The faster it is, the more productive I am.
It sure takes a lot of 'denial' about how Apple operates to believe that their little cores have anything to do with cost savings. Apple has never been too worried about using up die area to improve power efficiency, otherwise they would have not devote so much area to caches.

Now maybe Intel is tilting more towards multithreaded performance, but if so they are in denial about how the typical person uses their PC. A pretty small percentage will ever max out all their cores at 100%, so other than benchmark bragging such designs do little to improve things for the typical PC customer. They can make use of better battery life if they have small cores that are 2-3x more power efficient like Apple's though (at least until battery life exceeds a day, then most people simply don't care about further improvement)

If Apple was worried about conserving die area they would have placed more than two of their little cores on the M1 Pro / M1 Max Pro. Given they went from 2+4 on the iPhone to 8+2 on the M1P/M1M, they clearly know where the strengths of each lie. No one worried about "cost" would design a 400++ mm^2 die for a laptop with 512 bit wide LPDDR5. But that sure does wonders for saving power compared to using DIMMs and GDDR6!
You have it backwards. Apple only added the two small cores because the macOS scheduler has optimizations that utilize those cores to improve user experience.

Everything Apple did for the M1, they did to show the world “we can build a chip that is faster than out competitors!” You also completely gloss over that Apple sells the entire machine instead of just the SoC.
 

Hulk

Diamond Member
Oct 9, 1999
4,472
2,435
136
While my computers are first and foremost tools, I have to admit that I'm starting to also look at it as a hobby. I didn't have to buy 12700k or a SN850 drive, couldn't have got my work done on less expensive hardware but I'm an enthusiast and it's fun. I don't have a big need for 8+16 Raptor Lake but there is a chance I'll buy one. It would be fun to have.
 

Abwx

Lifer
Apr 2, 2011
11,543
4,327
136
Let me try to do this better, CBR23 is the absolute best case scenario for testing MT performance

At ISO speed 3.9 GHZ

8 Core Gracemont cluster gets 10336

8 Core Golden Cove cluster gets 12029 without HT, add 30% on top of that with HT/SMT = 15630.

Those numbers are off, at same frequency in CB R23 MT 1 P core = 1.88 E core.


 

Hulk

Diamond Member
Oct 9, 1999
4,472
2,435
136
Those numbers are off, at same frequency in CB R23 MT 1 P core = 1.88 E core.



I don't think 1.88 is correct. I calculate 2.09. The P cores are slippery and hard to shut down. I think the best way to isolate them is to run CB with P cores only. All E's shut off in the BIOS. Then run with P's and E's, finally subtract the P only score from the P+E score to isolate the E's. Process lasso does not give the same result so I'm pretty sure the P's are "helping out" during the benchmark.

Either way somewhere between 1.88 and 2.09 is definitely the value.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Those numbers are off, at same frequency in CB R23 MT 1 P core = 1.88

You do the Math for me.

8C/8T Golden Cove core at 3.9 Ghz does 12029 in CBR23 MT with an area of 84.23 mm2

8C/8 Gracemont cores at 3.9 Ghz does 10366 in CBR23 MT with an area of 26.34 mm2

Screenshot_20220206-123743_Chrome.jpg

Now to help out Golden Cove you can add 30% SMT and 5 Ghz and still would lose in Performance per area mm2

Screenshot_20220206-190149_Chrome.jpg

8C/16T Golden Cove cores at 5 Ghz gets 20105 in Cinebench R23 MT with an area of 84.23

1st place in Performance/mm2
Gracemont cores at 3.9 Ghz. 10366/26.34

2nd place

8C/16T Golden Cove at 5 Ghz. 20105/84.23

3rd place
8C/8T Golden Cove cores at 3.9 Ghz. 12029/84.23
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,630
14,061
136
Just edited the post with 5 GHZ speed, it does not change the fact that Gracemont cant be beat in performance per Area mm2
You have several issues with this weird perf/area calculation.

First issue, the CB23 scores for the E-cores that you're using are highly suspect. People have repeatedly tried to tell you that by either pointing out the known ratio between P and E performance in CB23, or by explaining how failing to isolate E-cores can taint benchmark results. Here's another very simple observation, let's take a look at the CB23 scores from the review you used, and compare ST and MT data:
1644219520145.png
  • Look at the P core data first, a ST scores of 1535 almost perfectly fits in a MT score of 12029 when HT is disabled. (1535x8= 12280)
  • Now look at the E core data, based on the ST score of 1089 one would deduce the maximum MT score for E cores should be under ~8700. Somehow the E cores have managed to gain ~20% performance in the MT test.
Techpowerup had several problems with their E core testing and also their Alder Lake configurable TDP testing, we briefly discussed these on the forum but mostly cut TP some slack due to immature platforms that most likely made testing difficult.

Second issue, you're calculating perf/area based on cluster sizes for each core type, yet the cores have access to the entire L3 cache structure in each case. This may not matter for Cinebench, but it creates this false illusion that we can use the same method for other benchmarks as well.
 
  • Like
Reactions: CHADBOGA and Gideon

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I wish Chips And Cheese did a CB23 workload deep dive for ADL and Zen3 just like they did for CB15 for SKL vs Zen2.

Such analysis would answer a lot of questions about cores small and big :)
 

Hulk

Diamond Member
Oct 9, 1999
4,472
2,435
136
You have several issues with this weird perf/area calculation.

First issue, the CB23 scores for the E-cores that you're using are highly suspect. People have repeatedly tried to tell you that by either pointing out the known ratio between P and E performance in CB23, or by explaining how failing to isolate E-cores can taint benchmark results. Here's another very simple observation, let's take a look at the CB23 scores from the review you used, and compare ST and MT data:
  • Look at the P core data first, a ST scores of 1535 almost perfectly fits in a MT score of 12029 when HT is disabled. (1535x8= 12280)
  • Now look at the E core data, based on the ST score of 1089 one would deduce the maximum MT score for E cores should be under ~8700. Somehow the E cores have managed to gain ~20% performance in the MT test.
Techpowerup had several problems with their E core testing and also their Alder Lake configurable TDP testing, we briefly discussed these on the forum but mostly cut TP some slack due to immature platforms that most likely made testing difficult.

Second issue, you're calculating perf/area based on cluster sizes for each core type, yet the cores have access to the entire L3 cache structure in each case. This may not matter for Cinebench, but it creates this false illusion that we can use the same method for other benchmarks as well.

I agree 100%. As I previously described I have more faith in E core performance calculated from subtracting P only from E+P performance in CB R23. While that may bring in other unknowns at least it IS known what the P only score is since the E's can be disabled in the BIOS. Any additional CB score can be assigned to the E's. Of course this method only will have validity for fully MT supported software like CB R23.
 
Last edited:
Jul 27, 2020
19,849
13,608
146
If Intel makes E-core only boot possible with Raptor Lake, that will solve such benchmarking issues for good. Maybe they can do it this time because it won't be a rushed launch like Alder Lake's. I mean, they didn't even have time to check that AVX-512 was disabled until after the launch. If AVX-512 was never intended to be available, it would have been disabled from the start.
 
Jul 27, 2020
19,849
13,608
146
Also, what are the chances that Raptor Lake die will be devoid of AVX-512 units? That will allow them to use that real estate for more cache and other useful stuff.
 

dullard

Elite Member
May 21, 2001
25,485
3,979
126
Intel said that a single Gracemont core was exactly 1/4 the size of a single Golden Cove core and that made people think that a Gracemont Cluster(4 cores) would use the same die size area of a single P core and as you can see on the annotations, they dont. The 16 core cluster use more area(52.68 vs 42.11)
They did imply it by the use of many drawing diagrams so many people believe it to be 1:4, but is still quite a feat to be Honest.
Lets look at their drawings:

1644246141501.png

1644246610031.png

While Intel's drawings are not to scale (they do not show the 4 E-core cluster at 25% larger than the P core), they do show that the 4 E-core cluster is larger (13.2% in the top image above and 18.5% in the bottom image).

I'm not sure why you would take drawings as being 100% accurate to die sizes anyways. It isn't like the memory, display, GNA 3.0, or PCIe drawings are the same size (or even in the same place as the real die).
 
Last edited:
  • Like
Reactions: nicalandia

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
hey can make use of better battery life if they have small cores that are 2-3x more power efficient like Apple's though (at least until battery life exceeds a day, then most people simply don't care about further improvement).

E cores have nothing to do with battery life. Zero. They are there to enable better perf/watt. In terms of battery life though, having better management, and in the case of Intel having the PCH on-die is the way to better performance. For the latter it won't happen until Meteorlake.

And the reason PCH on-die is needed for Intel is because since Icelake, they can idle pretty damn low, but the transition time is dog slow, so even bursty applications like web browsing the low idle power is made completely null. I was initially confused to see 25 hour idle battery life but 7-8 hours web browsing. You may not be aware of it but the CPU is going from sleep states to active hundreds of times a second.

You could make the case the E cores are there for increasing battery life under load, but for laptops it doesn't matter since the base number is piss poor and all has to do with TDP.
 
Jul 27, 2020
19,849
13,608
146
I was initially confused to see 25 hour idle battery life but 7-8 hours web browsing. You may not be aware of it but the CPU is going from sleep states to active hundreds of times a second.
Web browsers have tons of stuff going on in the background, like worker processes. I think Edge Chromium claims increased battery life by putting most of them to sleep or letting them wake up the CPU less often.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Also, what are the chances that Raptor Lake die will be devoid of AVX-512 units? That will allow them to use that real estate for more cache and other useful stuff.
Near Zero, this design has been set in stone for a few years back, Golden Cove/Raptor Cove were design to have AVX-512 Support in Xeons(the cores that make the most money to the company) when they segment their cores they either disable it in Bios or Fuse them by Laser.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Okay, so perhaps the TPU MT core numbers were off(most likely they did not know they had an active P core )

So lets use ST to test Performance Per Area at stock speed, also would like to compare it with Apple M1 Firestorm Performance Core and AMD Zen 3 for reference

1644333357672.png


Intel Golden Cove core with L2$ as measured by Locuza is 7.04 mm2 and it gets 1,937 points in GB5
Apple Firestorm core with L2$ measured by Semianalysis is 3.83 mm2 and gets 1,745 points in GB5
AMD Zen 3 core with L2$ as measured by Locuza is 4.27 mm2 and it gets 1,506 points in GB5
Intel Gracemont core with L2$ as measured by Locuza is 2.19 mm2 and gets 1,168 in GB5


Performance/mm2

1st place is Intel Gracemont core with 532 Geekbench5 points per mm2
2nd place is Apple Firestorm core with 455.6 Geekbench5 points per mm2
3rd place is AMD Zen3 core with 352.7 Geekbench5 points per mm2
4th place is Intel Golden Cove with 275.1 Geekbench5 points per mm2

I left out L3$ because Apple Firestorm lack L3$ and the L3$ along with the Ring Bus are Huge in Intel recent CPU uArchs skewing the numbers.
 
Last edited:
  • Like
Reactions: Saylick

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Okay, so perhaps the TPU MT core numbers were off(most likely they did not know they had an active P core )

So lets use ST to test Performance Per Area at stock speed, also would like to compare it with Apple M1 Firestorm Performance Core and AMD Zen 3 for reference

View attachment 57150


Intel Golden Cove core with L2$ as measured by Locuza is 7.04 mm2 and it gets 1,937 points in GB5
Apple Firestorm core with L2$ measured by Semianalysis is 3.83 mm2 and gets 1,745 points in GB5
AMD Zen 3 core with L2$ as measured by Locuza is 4.27 mm2 and it gets 1,506 points in GB5
Intel Gracemont core with L2$ as measured by Locuza is 2.19 mm2 and gets 1,168 in GB5


Performance/mm2

1st place is Intel Gracemont core with 532 Geekbench5 points per mm2
2nd place is Apple Firestorm core with 455.6 Geekbench5 points per mm2
3rd place is AMD Zen3 core with 352.7 Geekbench5 points per mm2
4th place is Intel Golden Cove with 275.1 Geekbench5 points per mm2

I left out L3$ because Apple Firestorm lack L3$ and the L3$ along with the Ring Bus are Huge in Intel recent CPU uArchs skewing the numbers.
And (532 + 275) /2 = 403.5
 
  • Like
Reactions: nicalandia

LightningZ71

Golden Member
Mar 10, 2017
1,792
2,151
136
Okay, so perhaps the TPU MT core numbers were off(most likely they did not know they had an active P core )

So lets use ST to test Performance Per Area at stock speed, also would like to compare it with Apple M1 Firestorm Performance Core and AMD Zen 3 for reference

View attachment 57150


Intel Golden Cove core with L2$ as measured by Locuza is 7.04 mm2 and it gets 1,937 points in GB5
Apple Firestorm core with L2$ measured by Semianalysis is 3.83 mm2 and gets 1,745 points in GB5
AMD Zen 3 core with L2$ as measured by Locuza is 4.27 mm2 and it gets 1,506 points in GB5
Intel Gracemont core with L2$ as measured by Locuza is 2.19 mm2 and gets 1,168 in GB5


Performance/mm2

1st place is Intel Gracemont core with 532 Geekbench5 points per mm2
2nd place is Apple Firestorm core with 455.6 Geekbench5 points per mm2
3rd place is AMD Zen3 core with 352.7 Geekbench5 points per mm2
4th place is Intel Golden Cove with 275.1 Geekbench5 points per mm2

I left out L3$ because Apple Firestorm lack L3$ and the L3$ along with the Ring Bus are Huge in Intel recent CPU uArchs skewing the numbers.
While that certainly paints an "optimum case scenario" for Gracemont, largely due to it completely eliminating the design trade-offs of using a "quad" of E-cores with a shared, and theoretically bandwidth limited shared L2 cache segment with a single, shared ring stop. As we have seen in other benches, attempting to fully load the E cores exposes those trade-offs and reduces the throughput of those cores, reducing their effective performance per area.

In other words, using ST numbers for the E cores distorts the real picture in heavy MT scenarios.
 

repoman27

Senior member
Dec 17, 2018
381
536
136
E cores have nothing to do with battery life. Zero. They are there to enable better perf/watt. In terms of battery life though, having better management, and in the case of Intel having the PCH on-die is the way to better performance. For the latter it won't happen until Meteorlake.

And the reason PCH on-die is needed for Intel is because since Icelake, they can idle pretty damn low, but the transition time is dog slow, so even bursty applications like web browsing the low idle power is made completely null. I was initially confused to see 25 hour idle battery life but 7-8 hours web browsing. You may not be aware of it but the CPU is going from sleep states to active hundreds of times a second.

You could make the case the E cores are there for increasing battery life under load, but for laptops it doesn't matter since the base number is piss poor and all has to do with TDP.
The PCH has been on-package for mobile since Haswell U/Y. It's an LP PCH built on a process specifically tailored for I/O and connected to the CPU via OPI. How do you see integrating the PCH into the CPU die helping performance or battery life in any significant way? Intel is disaggregating the CPU die into CPU/GPU/SoC tiles for Meteor Lake, not integrating the PCH. Intel is doing the exact opposite of what you're suggesting.