Discussion P vs E cores in Raptor lake CPUs - determining optimal number of cores

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kocicak

Senior member
Jan 17, 2019
982
973
136
Following the discussion of how many P and E cores should be in the CPU, I measured performance of these cores in Intel 13700K CPU at 3400 MHz using Cinebench R23 maximal load as a workload, here are the results. I also estimated from a die shot, that E core has 1/4 of a area of a P core.

13700K PE core calcul.png

You can see, that E core has roughly half of the performance of a P core at 4 times less area. The conclusion from this is, that for increasing performance you keep adding E cores.

Unfortunatelly the E core is less efficient than the P core, and thus adding E cores increases power draw and decreases efficiency.

You can see in the table above, that if we substituted all E cores in a 8 +16 CPU by 4 P cores, we would lose 30% performance.

You can see performance, power draw and efficiency of all P and E core combinations below, available area is for 16 P cores:

core combinations.png

So you have a decision problem now. For
  • Available area and physically possible core combinations
  • Particular workload
  • Combination of P and E core frequencies
you need to find optimal combination of cores with possible decision making scenarios:
  1. Reach maximal performance within possible power draw and not care about efficiency at all.
  2. Reach maximal performance within possible power draw and still maintain some minimal efficiency requirement.
  3. Reach maximal efficiency while still maintaining required performance.
In this example, we get three core count combinations, some of them may not be physically possible on the silicone:

Core count optimum2.png


And all this is valid for ONE WORKLOAD and ONE COMBINATION of P and E core frequencies.

I do not think that it is possible to easily say that they should have swapped this many P for E cores or vice versa. There are too many variables at play.

But it should be obvious that E cores are very useful.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
That's what the counter-argument is all about: consistency in workloads, and arguably even consistency in Intel's designs. Hybrids push theoretical performance way up, but when it comes to running today's software... it gets complicated.

This issue can also be deducted from Intel's approach to power/cost scaling. For mobile Alder Lake they start with 2+8 for low power, shift gears to 4+8 for the classic laptop chip, and then move to 6+8 for high performance desktop replacements. The P cores are scaling the performance across segments. On the desktop Raptor Lake they prioritize P cores even more with the value chips, only the flagship model favors E cores again.

So what can we conclude using Intel's own actions: favoring E cores is worth at very low power & area and at very high power & area. P cores are Priority cores in the consumer space, you always try to fit enough of them to make sure consumer workloads run as fast as possible for the form factor. Today that magical number seems to be 8 according to Intel. Tomorrow... who knows?

Today we have a value champ from Intel the shape of 13600K with 6+8 config. I'd be very curious to see how people would react if they were offered a 13600KP with 8+0 config for the same price (roughly same area). Would average consumers still choose CPU rendering performance over consistent and more future proof gaming performance?
So I think the problem with this argument is that the top chips aren't for typical consumer workloads. If gaming was the only focus, it wouldn't make much sense for Ryzen to support even 2 CPU dies, nor for all the E-cores Intel's been adding. Clearly both companies intend for their chips to see much more varied workloads, up to and including content creation and productivity. They happen to be the best for gaming and such as well, but that's not really sufficient to justify their existence.

But even for consumer workloads, hybrid is mainstream, not the exception. Mobile's figured this all out ages ago, and you don't see people begging for big-core only chips there. You can even make similar power vs area efficiency arguments when you compare e.g. the A7xx series to A5xxx.

And tbh, I think a large amount of the E-core criticism isn't about how they actually perform, but rather that it's something Intel is currently doing differently than AMD. When AMD inevitably goes in a similar direction, I expect most of those complaints to mysteriously stop. And even if they don't necessarily go that way with Zen 5, they've made it very clear through interviews and documentation that they will in the not-so-distant future.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,511
136
So I think the problem with this argument is that the top chips aren't for typical consumer workloads. If gaming was the only focus, it wouldn't make much sense for Ryzen to support even 2 CPU dies, nor for all the E-cores Intel's been adding. Clearly both companies intend for their chips to see much more varied workloads, up to and including content creation and productivity. They happen to be the best for gaming and such as well, but that's not really sufficient to justify their existence.

But even for consumer workloads, hybrid is mainstream, not the exception. Mobile's figured this all out ages ago, and you don't see people begging for big-core only chips there. You can even make similar power vs area efficiency arguments when you compare e.g. the A7xx series to A5xxx.

And tbh, I think a large amount of the E-core criticism isn't about how they actually perform, but rather that it's something Intel is currently doing differently than AMD. When AMD inevitably goes in a similar direction, I expect most of those complaints to mysteriously stop. And even if they don't necessarily go that way with Zen 5, they've made it very clear through interviews and documentation that they will in the not-so-distant future.
For what I do (and many others, but , yes a minority) they all have to be equal strong cores. I now have a dual 7763 Milan, and tomorrow will have a dual 7V12 Rome server. Just those 2 boxes is 512 strong cores. (well, 256 cores and SMT) I hope they keep going that route in server, as I may just have to convert to all server boxes. But one 7950x does the same work as 32 Rome EPYC cores. BIG.little just complicates that situation. Here is 1055 tasks running

1666632797939.png
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
For what I do (and many others, but , yes a minority) they all have to be equal strong cores.
Do they have to be strong, or just equal? Given typical server clock speeds and full utilization with SMT, you're almost certainly looking at worse perf/thread than Gracemont in Raptor Lake. ISA considerations aside, the best chip for your particular use case would look like Bergamo or Sierra Forest.

I think this is actually something of a problem with SMT in a hybrid world. It makes sense if you need the highest perf/thread available most of the time and want to recoup some MT perf when highly loaded, but if the system spends all of its time with SMT loaded, then you end up with similar or even worse perf/thread vs "small" cores and worse power/area efficiency.

You can see this in how Thread Director prioritizes SMT threads dead last, and in the real world, latency-critical applications (often user-facing web apps and microservices) actually disable SMT to maximize per thread performance.
 

coercitiv

Diamond Member
Jan 24, 2014
6,201
11,901
136
So I think the problem with this argument is that the top chips aren't for typical consumer workloads. [...] Clearly both companies intend for their chips to see much more varied workloads, up to and including content creation and productivity.
Yes! They're pushing these chips for prosumers, but as you've already seen from other threads we have plenty of prosumers who would rather have the real deal instead - a competent Intel HEDT or Threadripper system. Many want even MORE cores, more I/O.

I touched on this before, hybrids with herds of E cores would make PERFECT sense for workstations, alas they're forbidden to enter that territory until Intel gets their tile strategy in place.

But even for consumer workloads, hybrid is mainstream, not the exception. Mobile's figured this all out ages ago, and you don't see people begging for big-core only chips there.
I never argued against hybrids in Intel's mobile lineup. In fact, even as a skeptic from day one, I argued that I'm ready to accept their weird desktop strategy for the sole purpose of allowing the Mont family a place to grow. The only time I criticized an Intel mobile hybrid was with Lakefield, because a single P core isn't enough in a premium chip. That being said, the 2+8 and 4+8 chip configs are great for the average consumer, or at the very least a very interesting alternative.

And tbh, I think a large amount of the E-core criticism isn't about how they actually perform, but rather that it's something Intel is currently doing differently than AMD.
There's a lot of room for discussion between "hybrid bad" and "hybrid perfect", and I think a lot of the ideas circulating the forum about the future of E cores have little to do with Intel's philosophy. Personally I thin the recipes for mobile, consumer desktop and HEDT should be different. That's the hybrid approach I'm looking forwards to.
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
Just a sidenote, I checked if the two measurents at different frequencies are consistent and they are, so I extrapolated performance of the two cores at different frequencies. Can you find performance of some different cores in CNB R23 at some of these frequencies somewhere?

IPC raptor CNB R23.png

I realised that these numbers, which are 100% multithread load divided by number of cores may not be too easy to find elsewhere, because most often you probably find 1 thread numbers. But at least they express the true performance of the cores better...
 
Last edited:
  • Like
Reactions: moinmoin

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
And tbh, I think a large amount of the E-core criticism isn't about how they actually perform, but rather that it's something Intel is currently doing differently than AMD.
I'd say the big criticism is not that Intel is doing something different from AMD but from the whole industry. Hybrid up to that point has been about efficiency as in power efficiency, not as in E cores spam area efficiency.

Going the latter way makes it about performance again since where Intel's approach to hybrid works best is with embarrassingly parallel workloads. But as we know and due to Amdahl's law such workloads on CPUs are both niche and most often still better served by GPUs. That leaves workloads that don't scale perfectly but hit an amount of threads wall somewhere. With all those workloads you'll always prefer cores capable of peak ST performance simultaneously for all threads instead more cores effectively offering just half the ST performance. As soon as E cores are involved covering some threads of the workload the overall result will be slower than with full cores, and since the amount of threads is limited unlike with embarrassingly parallel workloads that deficiency can't be made up with even more E cores.
 
  • Like
Reactions: Tlh97

Khato

Golden Member
Jul 15, 2001
1,206
250
136
As others have indicated, optimal number of P vs E cores depends entirely on the workload. Intel obviously believes that there's not much need for more than 8 P cores currently. Not too surprising given that the consoles are based on 8 cores. What other consumer workloads don't fall into the 'extremely parallel' category and actually benefit from being run on higher performance cores?

Once an adequate number of P cores are present for real-time user workloads, there's no reason to spend remaining area on anything other than E cores. For a given die area E cores will either be lower power at iso performance or higher performance at iso power compared to P cores. The original post comparison running all cores at 3.4GHz was running the E cores at 2x performance per area for slightly worse performance per watt, while the later 5.5GHz/4.2Ghz comparison is around 1.6x performance per area and 1.35x performance per watt.

From what I can tell Intel E cores aren't currently being made use of to improve idle power efficiency, but I'd be surprised if that doesn't change with future implementations.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Yes! They're pushing these chips for prosumers, but as you've already seen from other threads we have plenty of prosumers who would rather have the real deal instead - a competent Intel HEDT or Threadripper system. Many want even MORE cores, more I/O.
Sure, but I don't see those as mutually exclusive. It's perfectly reasonable to want a proper HEDT chip while simultaneously acknowledging the advantages E-cores are bringing to the higher end of mainstream desktop.
I never argued against hybrids in Intel's mobile lineup. In fact, even as a skeptic from day one, I argued that <snip>
For clarity's sake, I wasn't specifically referring to you here. More a comment about the nature of the criticism I see here. A lot more about how people feel about them than a real discussion of performance.
There's a lot of room for discussion between "hybrid bad" and "hybrid perfect", and I think a lot of the ideas circulating the forum about the future of E cores have little to do with Intel's philosophy. Personally I thin the recipes for mobile, consumer desktop and HEDT should be different. That's the hybrid approach I'm looking forwards to.
Yes, ultimately it would be best if there was complete flexibility to choose the amount and type of each core. If Intel were to simply go to two compute dies with P-cores and E-cores respectively, that might go a long ways towards accommodating different market segments. But clearly we're at least several generations away from that.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I'd say the big criticism is not that Intel is doing something different from AMD but from the whole industry. Hybrid up to that point has been about efficiency as in power efficiency, not as in E cores spam area efficiency.

Going the latter way makes it about performance again since where Intel's approach to hybrid works best is with embarrassingly parallel workloads. But as we know and due to Amdahl's law such workloads on CPUs are both niche and most often still better served by GPUs. That leaves workloads that don't scale perfectly but hit an amount of threads wall somewhere. With all those workloads you'll always prefer cores capable of peak ST performance simultaneously for all threads instead more cores effectively offering just half the ST performance. As soon as E cores are involved covering some threads of the workload the overall result will be slower than with full cores, and since the amount of threads is limited unlike with embarrassingly parallel workloads that deficiency can't be made up with even more E cores.
This same argument (Amdahl's law, GPUs, blah blah blah) was similarly used to dismiss AMD offering 16 cores in a mainstream socket. Yes, not all software can or will benefit, but those users can happily stick with lower end chips, while the people who do benefit from more cores/threads have a product that caters to that need. And the benchmarks show this. There don't appear to be many/any real-world applications that exhibit that particular kind of scaling. If anything, many seem to have a few high priority threads (e.g. user-facing) and a number of low priority background tasks (rendering, AI, etc.) that meshes well with a hybrid architecture.

Just to tie in to my musings above, couldn't you also make this same argument against SMT? You invest a lot of architectural effort into a feature that slightly harms 1T performance simply to improve aggregate throughput for >8/16 threads. Is that not fundamentally similar to the tradeoffs hybrid makes?

And all that aside, clearly we have a lot of people criticising E-cores while simultaneously asking for HEDT chips and such, so I really don't think it's a matter of simply finding more cores useless. Certainly I think we're going to see even higher thread counts in the future.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
This same argument (Amdahl's law, GPUs, blah blah blah) was similarly used to dismiss AMD offering 16 cores in a mainstream socket.
Indeed. More cores is harder to make good use of than less cores. More cores with a part of which have less peak ST performance is even harder to make good use of.

The big difference between the two is that with the former you can throw any software at it and ideally don't have any significant performance variation, regardless of the scaling capability of the software used and regardless of the amount of software running concurrently. That's no longer true with the latter where the exact core types used affect the performance (or an application being in the front or background as a pseudo solution to this issue).

Just to tie in to my musings above, couldn't you also make this same argument against SMT?
Indeed. Though efficient predictable use of SMT is commonly considered a solved issue. Nowadays it's not the benchmarks showing good use of SMT that are news, it's benchmarks of software that actually is hurt by the use of SMT compared to it being turned off. My impression is that the latter lately mostly reappeared with games using forms of DRM.
 
  • Like
Reactions: Tlh97

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
Indeed. More cores is harder to make good use of than less cores. More cores with a part of which have less peak ST performance is even harder to make good use of.

The big difference between the two is that with the former you can throw any software at it and ideally don't have any significant performance variation, regardless of the scaling capability of the software used and regardless of the amount of software running concurrently. That's no longer true with the latter where the exact core types used affect the performance (or an application being in the front or background as a pseudo solution to this issue).


Indeed. Though efficient predictable use of SMT is commonly considered a solved issue. Nowadays it's not the benchmarks showing good use of SMT that are news, it's benchmarks of software that actually is hurt by the use of SMT compared to it being turned off. My impression is that the latter lately mostly reappeared with games using forms of DRM.
Yeah but no, there is no perfect system, ryzen has two ccx and performance changes depending on if smt is on or off if the second ccx is on or off and on how many cores are doing work, it was much worse with 5xxx and 7xxx had to go to 240W to keep at least the clocks close to full when all cores are working.
Depending on how many cores are loaded you get different performance per core, if you run more than one workload they will be fighting for compute and the performance for each will change compared to them running alone, even if you use affinity.

With intel you also have two groups of cores, with different general performance, but all cores of each group can run at full clocks at any time so you can figure out how fast a certain workload is going to run on them at all times.
You can also always separate workloads so they will never interfere and never change performance.

Again I'm not saying one is better than the other, both are different and might be better suited for different (mixes of) workloads though.
I am not sure, but this guy got a 25% gaming boost on a single CCD

View attachment 69802
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
SMT in the big/little core CPU with high core count makes little sense, because it was originally developed as a way to boost performance of the CPUs when they had just a few cores.

I am not at my home computer now and cannot measure the percentage of performance boost SMT brings on the P cores, but say it is 20%.

In the second scenario, 16T at 8 P cores brought performance of 2872 per core. That is 1436 per thread.

8T at 8 P cores would bring 2393 per thread and core.

1T at E core was 1144 per core.

On 1 unit od area, you can get:
  • 718 from P core with SMT, and that only in the 100% load scenario, when the second threads are finally utilised.
  • 598 from P core without SMT, or P core with SMT but lower than say 90% load,
  • 1144 from E core.
Most of the time, when the CPU is not 100% utilised, you are getting approximatelly half of performance per area from P cores than from E cores. And that is from P cores running at very high frequency with low energy efficiency.

I wonder how much area could be spared, if P cores did not have SMT stuff on them and were just 1T per core.

I also wonder if disabling SMT in the situation you have a bunch of E cores for the "other threads" would not make the CPU run better, because it would have less decisions to make.

Too bad I already returned the 13900K, I could have compared the perfomance with SMT on and off. It has so many E cores that for other than a few percent better performance at 100% load the SMT is just useless on that chip IMO.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
Yeah but no, there is no perfect system, ryzen has two ccx and performance changes depending on if smt is on or off if the second ccx is on or off and on how many cores are doing work, (...)
There are of course many variables affecting performance. There are also golden cores which perform the best implying that there are significantly worse performing cores. There is different latencies depending on how far cores are from each other and/or LLC and/or the IMC etc. This can be extended further and further.

Nevertheless the peak ST performance of P cores (without SMT sharing) is so significantly above that of E cores that this hybrid design requires scheduling decisions that makes prediction of performance of multiple concurrently running applications much harder. The performance difference is so much higher than any other parameter so far, and that adds complexity if one wants to make the most of the available theoretical performance. The decision to handle that issue by moving foreground processes on P and background processes to E cores first respectively is one solution to add more predictability again, but this obviously won't help in situations where the user wants a workload to finish as fast as possible without staying the foreground. And so on.
 
  • Like
Reactions: Tlh97

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
There are of course many variables affecting performance. There are also golden cores which perform the best implying that there are significantly worse performing cores.
That's the cores that reach the max clocks with the least amount of vcore using the least amount of power, the max clock they reach is locked unless you overclock.
All cores can reach the same clocks they just need more vcore/power.
Nevertheless the peak ST performance of P cores (without SMT sharing) is so significantly above that of E cores that this hybrid design requires scheduling decisions that makes prediction of performance of multiple concurrently running applications much harder. The performance difference is so much higher than any other parameter so far, and that adds complexity if one wants to make the most of the available theoretical performance. The decision to handle that issue by moving foreground processes on P and background processes to E cores first respectively is one solution to add more predictability again, but this obviously won't help in situations where the user wants a workload to finish as fast as possible without staying the foreground. And so on.
On a multithreading platform you never know how much free compute a core has, unless you can have a system that only runs one single thread per core (you are only running DC type workloads) you never have any idea of how much compute any core has left over, so you never know how fast a thread will run on it.
You have to rely on the task scheduler making the best decisions.
 

DrMrLordX

Lifer
Apr 27, 2000
21,631
10,843
136
I wonder how much area could be spared, if P cores did not have SMT stuff on them and were just 1T per core.

Not much. SMT piggy-backs onto what is already a very wide core. Granted I don't know exactly how many transistors are required on Raptor Cove specifically to implement SMT, but in past designs, I've been told that the die area required for SMT has been relatively small.
 
Last edited:
  • Like
Reactions: Tlh97

gt3911

Junior Member
Jul 7, 2008
19
0
66
Do the e-cores help with power usage to a significant level when the machine is experiencing light loads, idle, to web use?

The latest anandtech 13900k & 13600k reviews talked about peak power, what about everything in-between? With energy prices as they are these peak power figures don't appeal to me. But on a general do everything PC.... Is there any case that these chips are super efficient on low loads to maybe balance out the peaks - or the electric bill is certain to take the hit? Paying around 42p p/kwh+ $0.49 it starts to add it
 

Cstops

Junior Member
Oct 14, 2022
5
17
41
Do the e-cores help with power usage to a significant level when the machine is experiencing light loads, idle, to web use?

I don’t think so. The e-cores are area-efficient and not power-efficient per se and are there primarily to assist in MT loads. I’d want to say light loads (e.g. ST) wouldn’t impact it too much and would (initially) reside on the p-cores. Not sure if anyone measured the power draw from the e-cores alone for Raptor Lake but from Anandtech’s review of the 12900K, the maximum load/draw puts the 8 e-cores on the 12900K at around 48W. I would guess/surmise that the Raptor Lake e-cores would be somewhere in this ballpark.

 
  • Like
Reactions: Tlh97 and gt3911

coercitiv

Diamond Member
Jan 24, 2014
6,201
11,901
136
Do the e-cores help with power usage to a significant level when the machine is experiencing light loads, idle, to web use?
They can help, but it's situational. Under very light loads they allow the scheduler to keep threads away from the P cores, thus keeping them parked at all times. Keeping an E core awake is less expensive than keeping a P core awake. For example, if you have multiple messaging apps active in the background, the E cores may help shave package power draw from 13W to something like 8W. Sounds good in theory, but once you compare versus the total "idle" system use of a typical desktop... 5W it's merely a fraction. Also, closing down the apps with continuous light load such as messaging apps will bring down package power to around 2-3W no matter whether you have E -cores active or not.

If you really want to save on power, bring down your max clocks by 5-10%, make sure you have sleep states enabled in BIOS, use Balanced mode in Windows, and be mindful of active apps in the background. Having E cores on the desktop is just the cherry on top when it comes to energy efficiency.
 
Last edited:
  • Like
Reactions: Tlh97 and gt3911