Question Alder Lake - Official Thread

Page 61 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
HT on the small cores would be dead last in thread priority though (P > E > P-HT > E-HT). My knowledge of architecture is very limited, but just from that I would imagine it would probably have a very limited impact except for heavily multithreaded scenarios which are pretty rare in day to day usage for now. Maybe it would nudge a few % more out of encoding scenarios, but for gaming or similar I can't imagine it would make a difference since Intel provides more than enough threads for now.

P>E>P-HT is only how it works for foreground apps. The E's work on background apps, which often are video encoding, audio encoding, photoshop batch processing, etc.. and in those cases HT is a big performance booster.

While gaming is a non-issue for me I realize it is hyper important in benchmarks. 400fps is playable while 395fps is not.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Still hoping someone can do some game streaming on their Alder Lake to let us know if OBS gets moved solely to the E cores!
From what Hulk posted about his workload using all cores when priority is at normal or higher, it should run on more than just the e cores, OBS does start the encoding task with a priority of normal by default which should make thread director utilize all the cores for it whenever possible.

Also it's extremely hard to figure out on which and how many cores a software runs if you have more than one thing running at a time. How is anybody going to be able to tell if core number X load is due to obs or the game, or how much % of this load is due to either?
 

DrMrLordX

Lifer
Apr 27, 2000
21,643
10,859
136
Also it's extremely hard to figure out on which and how many cores a software runs if you have more than one thing running at a time. How is anybody going to be able to tell if core number X load is due to obs or the game, or how much % of this load is due to either?

Streaming puts a pretty heavy load on your system unless you offload streaming duties to a dedicated box. You will find out in a hurry if your system can't keep up with the bitrate and quality level you've selected.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Streaming puts a pretty heavy load on your system unless you offload streaming duties to a dedicated box. You will find out in a hurry if your system can't keep up with the bitrate and quality level you've selected.
Yes but how will that help you determine if streaming stays on the e cores alone, if all p and e cores are at 100% or close to it.
 

DrMrLordX

Lifer
Apr 27, 2000
21,643
10,859
136
Yes but how will that help you determine if streaming stays on the e cores alone, if all p and e cores are at 100% or close to it.

I will repeat

You will find out in a hurry

If you are streaming something like Team Fortress 2 (which, by the way, won't peg more than 3 cores at 100% on a Matisse, much less Alder Lake p-cores or Vermeer) then you can maintain pretty much any bitrate and quality level you want. Most of the CPU's resources will be available to OBS rather than the game. And plenty of people stream/record that game.

But if you were streaming 1440p (which I have done) or 1080p with any kind of quality and you only had the e-cores working on the encode, then you need the p-cores to be engaged or the e-cores will likely not keep up. Especially on a 12700k. Stuttering will ensue. If you've ever done any game streaming before, you will immediately know when your encode settings are too much for your current rig. Dropped frames and other "fun".
 

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
Example of foreground/background behavior. Rendering the timeline of a video project using Magix Vegas Pro 19 using RenderPlus with the Happy Otter Scripts plug-in. Basically RenderPlus frameserves the assembled timeline using Debugmode Frameserver to the transcoder in avidub. So basically there are two compute intensive operations going on here. Vegas Pro 19 is assembling the timeline using CPU and iGPU and frameserving to the avidub x265 encoder.

As you can see in the screen shot below when VP19 is in the foreground all 16 P logical cores are slammed and the 4 E's are heavily loaded.
When I move to Chrome the P's reduce to about 25% load while the E's go to 100%. I think it would be "smarter" to keep the P's on the video work while the E's handle my web browsing!
If this is going to be the behavior moving forward then more E's please!
BTW, I have my all-core P's set to 4.5GHz and E's at 3.7GHz. I find very small performance difference going to 4.8GHz and quite a lot of power/heat/fan noise.

12700K Background rendering Vegas Pro 19.jpg
 

coercitiv

Diamond Member
Jan 24, 2014
6,213
11,956
136
I think it would be "smarter" to keep the P's on the video work while the E's handle my web browsing!
Until you decide the foreground app is Photoshop, at which point running PS on E-cores will be quite the i5 6600 experience. This needs to get fixed properly, not by guessing which workload should go where.

Foreground apps get priority to P cores, not exclusive access. The scheduler must be able to allocate worker threads to all cores based on thread priority and current core utilization. If E-cores are underutilized then by all means, keep collecting threads in there, but once a certain threshold is reached they must be gradually moved to P-cores. (or risk seriously under-performing in multitasking)

Here's a very basic representation of what I'm talking about. Obviously the low priority threads would get sprinkled around the P cores and not necessarily clumped together, but the diagram should hold true as long as we're talking more about CPU time than the physical CPU core being used.

alloc.png
 

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
I think the bottom line is that when there is work to be done all available compute should be utilized with the foreground app getting priority. Right now they have implemented a very course grained approach where the P's go to the foreground and the E's go to the background.

But there are exception cases. For example, right now I'm rendering a Vegas Pro project using RenderPlus and since the little progress bar is in the lower right portion of the screen Windows thinks that is a foreground process even as I type this. I can hear my CPU fans so I know all the cores are floored.

The other "solution" as others have noted here is to simply have more E cores available for background work. 12 or 16 would do nicely to keep the background projects moving along.
 
  • Like
Reactions: lightmanek

dullard

Elite Member
May 21, 2001
25,069
3,420
126
This needs to get fixed properly, not by guessing which workload should go where.
Over time it will be fixed properly. The programmers simply need to specify which thread gets which core. Visual Studio already let programmers specify priority levels when threads are created. But now, it lets you also specify which core to put it on. Programmers know best whether or not a thread needs P cores or E cores. Old software of course won't have this, but new software and updated software should be getting it over time.
 

coercitiv

Diamond Member
Jan 24, 2014
6,213
11,956
136
Over time it will be fixed properly. The programmers simply need to specify which thread gets which core. Visual Studio already let programmers specify priority levels when threads are created. But now, it lets you also specify which core to put it on. Programmers know best whether or not a thread needs P cores or E cores. Old software of course won't have this, but new software and updated software should be getting it over time.
We're not talking about optimization (as in getting best possible results with hints from the app dev), we're talking about Thread Director currently intervening over the "default" thread allocation and forcing threads over to the E-core complex when the app is no longer on foreground.

I already gave a pretty clear example of what happens in a prior post, all screenshots are the same Handbrake video conversion with different worker thread priority selected from inside the software. You can notice the scheduler is already perfectly capable to fill the E-cores with low priority threads and keep these threads away from the P-cores if it wants to. The problem is it wants to do it way too easily / often, it's almost as if someone tuned it for a 2+8 low power CPU...

1640017622138.png
 

dullard

Elite Member
May 21, 2001
25,069
3,420
126
We're not talking about optimization (as in getting best possible results with hints from the app dev), we're talking about Thread Director currently intervening over the "default" thread allocation and forcing threads over to the E-core complex when the app is no longer on foreground.
I fully realize what you were talking about. Yes, the default Thread Director setting is to put background programs onto the E cores even if the user wants them on the P cores. The long-term solution is for programmers to state that a specific thread is for the P core only. Then that overrides the Thread Director and makes your point irrelevant. It just will take quite a lot of time for software to add these flags.
 

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
I fully realize what you were talking about. Yes, the default Thread Director setting is to put background programs onto the E cores even if the user wants them on the P cores. The long-term solution is for programmers to state that a specific thread is for the P core only. Then that overrides the Thread Director and makes your point irrelevant. It just will take quite a lot of time for software to add these flags.
Apple has been doing this for some time now. Things like system services are fenced in to the e cores only for example. And compiler flags for devs to pin to either P or e core, or to let them float are in the tool chain now.
 

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
We're not talking about optimization (as in getting best possible results with hints from the app dev), we're talking about Thread Director currently intervening over the "default" thread allocation and forcing threads over to the E-core complex when the app is no longer on foreground.

I already gave a pretty clear example of what happens in a prior post, all screenshots are the same Handbrake video conversion with different worker thread priority selected from inside the software. You can notice the scheduler is already perfectly capable to fill the E-cores with low priority threads and keep these threads away from the P-cores if it wants to. The problem is it wants to do it way too easily / often, it's almost as if someone tuned it for a 2+8 low power CPU...

View attachment 54660

Yes, well stated. This is exactly the behavior we are talking about. Foreground apps work properly. It's the distribution of loading among foreground and background apps that is the problem. I don't think it is worked out on a per application basis. Windows needs to utilize all compute resources more effectively.

So I think we are talking about CPU inter-application performance as opposed to CPU intra-application performance, which was the one we were more concerned with before ADL came out. Or put different, the core distribution within a program is fine, it's the distribution among applications that is the problem.

This kind of reminds me of space exploration. Before the mission everyone has a question on their minds. When the probe gets there it's actually something else that no one thought would be puzzling that turns out to be the real interesting thing. Like when New Horizons found geologic activity on Pluto.
 
Jul 27, 2020
16,340
10,352
106

dullard

Elite Member
May 21, 2001
25,069
3,420
126
8P+HT graph makes the E-cores look useless from a performance perspective but paradoxically, 8P+HT+8E delivers bit more performance for a lot less power consumption, possibly due to P-cores not needing to turbo boost as high. 7P+12E should have been an option.
I'm not sure that "paradoxically" is the word that I would use. That is the exact purpose of the efficiency cores: more performance at a lower power. In this case, 4.96% more performance at 79.4% of the average power.

If I were to make the chip myself, I would have preferred 6P + 16 E in about the same die space.
 
  • Like
Reactions: insertcarehere

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
I'm not sure that "paradoxically" is the word that I would use. That is the exact purpose of the efficiency cores: more performance at a lower power. In this case, 4.96% more performance at 79.4% of the average power.

If I were to make the chip myself, I would have preferred 6P + 16 E in about the same die space.

Yes exactly. There is some juggling of the power envelope to achieve the best performance within certain power and thermal limits. This will of course be even more of a critical issue with the mobile parts.

Yes, I would also have preferred 6+16 over 8+8. I have demonstrated that for MT workloads 4E>1P. Once you have 6 super strong ST P's I think in most instances 8 additional E's would be more beneficial than 2 additional P's, from both a performance and power perspective. Hopefully besides the rumored 8+16 13900K we'll also get a 8+12 13700K though it'll probably be 8+8 for the 12700K.
 
Jul 27, 2020
16,340
10,352
106
I'm not sure that "paradoxically" is the word that I would use. That is the exact purpose of the efficiency cores: more performance at a lower power. In this case, 4.96% more performance at 79.4% of the average power.

If I were to make the chip myself, I would have preferred 6P + 16 E in about the same die space.
It seems paradoxical if you just view the first graph in isolation. Adding 8E to the mix of 8P+HT should increase performance a whole lot but that would totally blow the power budget and probably melt the CPU. I do agree that 6P+16E seems very attractive. It just seems curious. Like they had more confidence in getting defect free P-cores than E-cores.
 

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
It seems paradoxical if you just view the first graph in isolation. Adding 8E to the mix of 8P+HT should increase performance a whole lot but that would totally blow the power budget and probably melt the CPU. I do agree that 6P+16E seems very attractive. It just seems curious. Like they had more confidence in getting defect free P-cores than E-cores.

I have a feeling it was more to the point that 8+8 would be most competitive against AMD in most benchmarks. Benchmarking hasn't advanced to the stage where we are looking at running apps simultaneously and looking for a total finish time for all apps.
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
Why Intel tolerates such moves, or as if they don't care at all.


"The editors were not only able to find Core i5-12400F on sale, they actually bought one for review. A full test of the retail CPU should be expected in a matter of days. This situation very much reminds us of the German store selling Core i7-11700K nearly a month prior to launch."

But ok, i5 12400F QS sample it had already been tested by a French website.

 
  • Like
Reactions: lightmanek

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
I believe that Intel's choice to go with 8+8 over 6 +16 has more to do with their lack of confidence in scheduler behavior in Windows than anything else. If they were actively expecting that it would be a common requirement for end users to disable the "e" cores to get games or productivity applications to behave properly, then it would be the case that, in those comparisons, they wold seem uncompetitive.
 
  • Like
Reactions: igor_kavinski

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Yes, well stated. This is exactly the behavior we are talking about. Foreground apps work properly. It's the distribution of loading among foreground and background apps that is the problem. I don't think it is worked out on a per application basis. Windows needs to utilize all compute resources more effectively.

So I think we are talking about CPU inter-application performance as opposed to CPU intra-application performance, which was the one we were more concerned with before ADL came out. Or put different, the core distribution within a program is fine, it's the distribution among applications that is the problem.

This kind of reminds me of space exploration. Before the mission everyone has a question on their minds. When the probe gets there it's actually something else that no one thought would be puzzling that turns out to be the real interesting thing. Like when New Horizons found geologic activity on Pluto.
Here's the thing though, these are desktop CPUs and not server or render boxes, windows and thread director are tuned to give users that don't want to mess with anything the best possible user experience, not the best possible performance, and that means that if a program states that it's background it gets shifted to the e cores so that the full power of the CPU (remember that the e cores are added on to a normal last gen CPU core count) is available to the user, it's a lot less efficient but it's going to guarantee the most responsive system to the user. "hey look, I'm rendering a video and still get 0 frame drop" and the like, how people see a product is much more important than how that product actually is.

If said user is a power user that does want to mess with things, they know how to make the software use all cores.
 

Hulk

Diamond Member
Oct 9, 1999
4,230
2,016
136
Here's the thing though, these are desktop CPUs and not server or render boxes, windows and thread director are tuned to give users that don't want to mess with anything the best possible user experience, not the best possible performance, and that means that if a program states that it's background it gets shifted to the e cores so that the full power of the CPU (remember that the e cores are added on to a normal last gen CPU core count) is available to the user, it's a lot less efficient but it's going to guarantee the most responsive system to the user. "hey look, I'm rendering a video and still get 0 frame drop" and the like, how people see a product is much more important than how that product actually is.

If said user is a power user that does want to mess with things, they know how to make the software use all cores.

Good point. Tough to argue with it. The background work IS getting done, just slower than it could be. And yes, while this is going on the system remains 100% responsive able to handle any foreground task presented to it. Comes right back around to more E's, assuming you don't already have a 12900K. If I had understood this behavior before I bought the 12700K I might have scratch up the dough for the 12900k.