Question Alder Lake - Official Thread

Page 60 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

pcslookout

Lifer
Mar 18, 2007
11,936
147
106
Every let's do a clear install every week guy should switch to win 11, they make the maturing process a lot faster. Anyone who either wants to Install windows ONLY when switching computers or major parts (me), or anyone who actually needs to do work on their machine (me at work), should avoid newer windows versions like boiling lava 😂
That is the experience of my last 20 years as a PC enthusiast. Heck, I even held out on Win 7 till last Spring, when I upgraded to 9900K. Same with XP back in the day... I even endured random crashes on 98 till SP1 came out and I could switch off that stupid mouse accel under XP.

Do you know if the latest iso on the microsoft website will have this fix or does it take a few weeks to a month ?
 

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
I have done some extensive investigation of my 12700K with Cinebench R23. I really wanted to get to the bottom of the E core performance. Isolating P's and E's and then testing does not provide the same results as when they are run together. Probably due to overhead sharing and other unknowns. I have decided that in order to separate P and E performance I would run a bunch of tests at various core configuration and clock speeds and fit the data.

So, the main numbers here are how many CB R23 points a Golden Cove core earns per 1GHz, and the same for Gracemont. HT is on with Golden Cove. This is how they each perform when working together, not in isolation.

I think Gracemont and Golden Cove complement each other very well. Golden Cove has a very high IPC while Gracemont is quite productive for the die space it occupies.

There is still work to be done in Windows 11 with the Thread Director though. For one, when using Process Lasso and you try to constrain an application to using some P and E cores it only behaves correctly when the app is in the foreground. I have tested this extensively with Handbrake. 6P's and 4E's setting will occur when Handbrake is on top, but it will always revert to only E's in the background. Basically there is something going on in Windows 11 where E cores are ALWAYS used for background apps. Let me clarify. If I assign 6 P's to Handbrake and no E's then that is what it will use in all priority cases. But if I assign 6 P's and 4 E's it will use 6+4 in the foreground but only the E's in the background.

Remember that beautiful video/demo of the Thread Director Intel showed a while back? Switching cores for various instructions and loads? Yeah, all well and good for the foreground application but not for the background app, which always get switched to the E's. I don't know what good the Thread Director is going because basically the behavior is use the Thread Director and all of it's smarts to run the foreground app as optimally as possible. As for background apps, just throw them all on the E's.

Yes, it does seems like Intel and Microsoft spent a lot of time and effort on foreground tasks, which is also how CPU's are tested by reviewers.

All that is required to fix the situation is for unused compute for the foreground app to be dynamically allocated to background apps. As I've written before why keep 8 P's on photoshop when I'm using them like 5% capacity (bursts) when I'm editing? Some of them should switch to background apps between bursts and then back to PS when needed. Just keep 2 on the foreground app for responsiveness.

Yes, I'm making a big deal out of a relatively minor issue, I know that. For 99% of people doing day-to-day work it won't be noticed. But that's not how we are in this forum. We find something that bugs us and dive into it. I am really enjoying the 12700K, it's fast and furious and churns though anything I throw at it really quickly. It's just annoying while I'm encoding video in the background while typing in Word ONLY the E's are encoding video! That's silly!


506Cinebench R23 points per 1GHz for a Golden Cove core with HT
242Cinebench R23 points per 1GHz for a Gracemont core
109.1%Golden Cove IPC increase in Cinebench R23 over Gracemont
173.0%Golden Cove at 4.7GHz increased throughput than Gracemont at 3.6GHz (12700K stock)
46.5%Gracemont cluster at 3.6GHz increased throughput over 1 Golden Cove core at 4.7GHz
 

dullard

Elite Member
May 21, 2001
25,066
3,416
126
Remember that beautiful video/demo of the Thread Director Intel showed a while back? Switching cores for various instructions and loads? Yeah, all well and good for the foreground application but not for the background app, which always get switched to the E's. I don't know what good the Thread Director is going because basically the behavior is use the Thread Director and all of it's smarts to run the foreground app as optimally as possible. As for background apps, just throw them all on the E's.

Yes, it does seems like Intel and Microsoft spent a lot of time and effort on foreground tasks, which is also how CPU's are tested by reviewers.
You are correct, but I'd like to add 2 points.

1) I think the vast majority of people actually want their foreground task to have the highest performance. Yes, there are uses like yours where you want an intensive process to go on in the background while you are doing something else in the foreground. But that isn't a highly common use case. When I'm editing videos, my video editor is in the foreground, when I'm editing photos GIMP is in the foreground, when I'm doing heavy software simulations my code is running in the foreground. Your situation is a highly important use case, but just it doesn't apply to most uses. I can see converting videos in the background while I'm browsing the internet in the foreground, and that would be annoying to lose performance. It just isn't that common that I'll need the highest performance in that case.

2) Ultimately, we do want the E cores to be doing the heavy lifting. When there are 16+ E cores, the E cores will best best suited to the situation described in #1. The problem is that Alder Lake just has too few E cores right now. Thus to get better multi-threaded performance, P cores are used. Then you run into the situation of competition with the P cores amongst various programs. With each release of new generations, the problem gets smaller and smaller.

So Microsoft and Intel are left with an edge case that will mostly go away starting with Raptor Lake. I can see why they didn't spend that much time perfecting it.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
I have done some extensive investigation of my 12700K with Cinebench R23. I really wanted to get to the bottom of the E core performance. Isolating P's and E's and then testing does not provide the same results as when they are run together. Probably due to overhead sharing and other unknowns. I have decided that in order to separate P and E performance I would run a bunch of tests at various core configuration and clock speeds and fit the data.

So, the main numbers here are how many CB R23 points a Golden Cove core earns per 1GHz, and the same for Gracemont. HT is on with Golden Cove. This is how they each perform when working together, not in isolation.

I think Gracemont and Golden Cove complement each other very well. Golden Cove has a very high IPC while Gracemont is quite productive for the die space it occupies.

There is still work to be done in Windows 11 with the Thread Director though. For one, when using Process Lasso and you try to constrain an application to using some P and E cores it only behaves correctly when the app is in the foreground. I have tested this extensively with Handbrake. 6P's and 4E's setting will occur when Handbrake is on top, but it will always revert to only E's in the background. Basically there is something going on in Windows 11 where E cores are ALWAYS used for background apps. Let me clarify. If I assign 6 P's to Handbrake and no E's then that is what it will use in all priority cases. But if I assign 6 P's and 4 E's it will use 6+4 in the foreground but only the E's in the background.

Remember that beautiful video/demo of the Thread Director Intel showed a while back? Switching cores for various instructions and loads? Yeah, all well and good for the foreground application but not for the background app, which always get switched to the E's. I don't know what good the Thread Director is going because basically the behavior is use the Thread Director and all of it's smarts to run the foreground app as optimally as possible. As for background apps, just throw them all on the E's.

Yes, it does seems like Intel and Microsoft spent a lot of time and effort on foreground tasks, which is also how CPU's are tested by reviewers.

All that is required to fix the situation is for unused compute for the foreground app to be dynamically allocated to background apps. As I've written before why keep 8 P's on photoshop when I'm using them like 5% capacity (bursts) when I'm editing? Some of them should switch to background apps between bursts and then back to PS when needed. Just keep 2 on the foreground app for responsiveness.

Yes, I'm making a big deal out of a relatively minor issue, I know that. For 99% of people doing day-to-day work it won't be noticed. But that's not how we are in this forum. We find something that bugs us and dive into it. I am really enjoying the 12700K, it's fast and furious and churns though anything I throw at it really quickly. It's just annoying while I'm encoding video in the background while typing in Word ONLY the E's are encoding video! That's silly!


506Cinebench R23 points per 1GHz for a Golden Cove core with HT
242Cinebench R23 points per 1GHz for a Gracemont core
109.1%Golden Cove IPC increase in Cinebench R23 over Gracemont
173.0%Golden Cove at 4.7GHz increased throughput than Gracemont at 3.6GHz (12700K stock)
46.5%Gracemont cluster at 3.6GHz increased throughput over 1 Golden Cove core at 4.7GHz
I know you have a 12700K, so you can't simulate ADL-P 2+8, but could you try CB R23 with 2C+4c and a limited PL of 15W. I am interested how high It can clock and what is the performance.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
You are correct, but I'd like to add 2 points.

1) I think the vast majority of people actually want their foreground task to have the highest performance. Yes, there are uses like yours where you want an intensive process to go on in the background while you are doing something else in the foreground. But that isn't a highly common use case. When I'm editing videos, my video editor is in the foreground, when I'm editing photos GIMP is in the foreground, when I'm doing heavy software simulations my code is running in the foreground. Your situation is a highly important use case, but just it doesn't apply to most uses. I can see converting videos in the background while I'm browsing the internet in the foreground, and that would be annoying to lose performance. It just isn't that common that I'll need the highest performance in that case.

2) Ultimately, we do want the E cores to be doing the heavy lifting. When there are 16+ E cores, the E cores will best best suited to the situation described in #1. The problem is that Alder Lake just has too few E cores right now. Thus to get better multi-threaded performance, P cores are used. Then you run into the situation of competition with the P cores amongst various programs. With each release of new generations, the problem gets smaller and smaller.

So Microsoft and Intel are left with an edge case that will mostly go away starting with Raptor Lake. I can see why they didn't spend that much time perfecting it.

You are absolutely correct in stating that as we move forward and have more E's for the background apps this issue will in effect "go away." I think if I had a 12900K with 8E's that would probably be enough background compute to keep me "fed" with data in my foreground app. Like when I'm DxO PureRaw is processing RAW files to me in PS.

Raptor Lake with 8+16, or even 8+12 would be perfect.

Another simple solution that could be a Windows Control Panel setting would be to be able to "assign" P cores that could "group" with the E's. So for example, E's are currently background and P's foreground (for the most part). Allow two P's to be grouped with the 4 E's. That would still leave me with 6 P's for foreground apps.

Or if the 12900K price drops a bit more at Microcenter I'll simply upgrade and be done with it. It's just hard to wrap my head around the idea of spending $200 for 1 more Gracemont Cluster!
 

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
Can you t

I know you have a 12700K, so you can't simulate ADL-P 2+8, but could you try CB R23 with 2C+4c and a limited PL of 15W. I am interested how high It can clock and what is the performance.

I know, I'm dying to know how Alder Lake is going to do in the ultra mobile space as well. What will the clocks be? Things is desktop boards are just not tuned for ultra lower power operation. CPU package power is 15W idle. We're just not going to know this until Intel spills the beans.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
You are absolutely correct in stating that as we move forward and have more E's for the background apps this issue will in effect "go away." I think if I had a 12900K with 8E's that would probably be enough background compute to keep me "fed" with data in my foreground app. Like when I'm DxO PureRaw is processing RAW files to me in PS.

Raptor Lake with 8+16, or even 8+12 would be perfect.

Another simple solution that could be a Windows Control Panel setting would be to be able to "assign" P cores that could "group" with the E's. So for example, E's are currently background and P's foreground (for the most part). Allow two P's to be grouped with the 4 E's. That would still leave me with 6 P's for foreground apps.

Or if the 12900K price drops a bit more at Microcenter I'll simply upgrade and be done with it. It's just hard to wrap my head around the idea of spending $200 for 1 more Gracemont Cluster!

The above is why I believe that Intel should have included an SKU that has 6 P cores and 8 E cores on the desktop, like a 12650k or something. I think that it would have fit your use case perfectly.
 

DrMrLordX

Lifer
Apr 27, 2000
21,634
10,849
136
It's just annoying while I'm encoding video in the background while typing in Word ONLY the E's are encoding video! That's silly!

Can you do streaming tests with OBS? Maybe play a lightweight game and then stream it to YouTube Gaming or something? It would be interesting to know if only the E cores are active while the game has the foreground and OBS is running in the background.
 

dullard

Elite Member
May 21, 2001
25,066
3,416
126
If you can wait another year, why not two. 14900K or bust. ;)

BTW, I'm the king of waiting, my only computer is still using a C2Q...
We are still not sure if Meteor Lake is mobile-only or not. So, there is a chance that he'd wait for nothing to materialize. Plus, if you believe Momomo, Raptor Lake might be as early as August, so the wait might not be too long. https://www.techradar.com/news/intel-raptor-lake-cpu-launch-could-come-sooner-than-you-think

I was using a C2Q as my main home computer (i.e. not my work computation computer) until it fried this spring in a power outage. Those chips were good enough for light tasks even today. Since I had to replace it in the middle of a video card shortage, I went with the one chip that I really didn't want: Rocket Lake.
 
  • Like
Reactions: igor_kavinski

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
We are still not sure if Meteor Lake is mobile-only or not. So, there is a chance that he'd wait for nothing to materialize. Plus, if you believe Momomo, Raptor Lake might be as early as August, so the wait might not be too long. https://www.techradar.com/news/intel-raptor-lake-cpu-launch-could-come-sooner-than-you-think

I was using a C2Q as my main home computer (i.e. not my work computation computer) until it fried this spring in a power outage. Those chips were good enough for light tasks even today. Since I had to replace it in the middle of a video card shortage, I went with the one chip that I really didn't want: Rocket Lake.

I think many people who have not used Alder Lake are going to be disappointed by Raptor Lake reviews. This is obviously going to be a minor update. Minor update to the process, or what we used to call a "stepping," more E cores, and other relatively minor improvements. I'm thinking perhaps 5% better IPC mainly due to better memory subsystem, more E cores, and lower temps/thermals at equivalent Alder Lake frequencies.

If you have no experience with Alder Lake then the immediate response is "Bah, humbug!" But if you have one and know how powerful ADL is then you will also recognize how useful those extra E's will be.
 

jpiniero

Lifer
Oct 1, 2010
14,600
5,221
136
We are still not sure if Meteor Lake is mobile-only or not.

I think it is going to be. But there wouldn't be anything stopping Intel from doing "desktop" BGA products like the 11900KB if yields outperform internal expectations and there's a market for it and there's enough leaky chips to make it work.

Mind you, I think best case it's going to be like Comet/Ice and Raptor will be the vast majority of the 22/23 mobile volume.
 

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
Revisiting "Gracemont will be as fast as Skylake" claims by Intel.
In regards to that statement with Cinebench R23 here is what I found.
Gracemont - 242 points/1GHz for each Gracemont core
Skylake - 259/328 (without/with HT)
Haswell - 243/295 (without/with HT)
Ivy Bridge - 197/243 (without/with HT)

These numbers come from tests on my systems so I'm quite sure of their validity.

So speaking only about Cinebench R23.
Gracemont is almost as fast as Skylake (HT off) but 35% slower than Skylake with HT.
Gracemont is equal to Haswell without HT but 22% slower than Haswell with HT.
Gracemont is 23% faster than Ivy Bridge (without HT) and equal to Ivy Bridge (with HT).

I still think that adding HT to Gracemont would provide a huge boost in performance for the small amount of additional internal structures required. It seems like there is some low hanging fruit there for Intel should they ultimately choose to pick it.
 
  • Like
Reactions: lightmanek

tomatosummit

Member
Mar 21, 2019
184
177
116
Revisiting "Gracemont will be as fast as Skylake" claims by Intel.
In regards to that statement with Cinebench R23 here is what I found.
Gracemont - 242 points/1GHz for each Gracemont core
Skylake - 259/328 (without/with HT)
Haswell - 243/295 (without/with HT)
Ivy Bridge - 197/243 (without/with HT)

These numbers come from tests on my systems so I'm quite sure of their validity.

So speaking only about Cinebench R23.
Gracemont is almost as fast as Skylake (HT off) but 35% slower than Skylake with HT.
Gracemont is equal to Haswell without HT but 22% slower than Haswell with HT.
Gracemont is 23% faster than Ivy Bridge (without HT) and equal to Ivy Bridge (with HT).

I still think that adding HT to Gracemont would provide a huge boost in performance for the small amount of additional internal structures required. It seems like there is some low hanging fruit there for Intel should they ultimately choose to pick it.
Do you have any results of the whole ecore cluster? From my observation of current reviews/benchmarks it seems that's where appears to suffer. The core alone is good but when all four of them are fully loaded the mt performance drops well below it's potential.
Also it might be worth doing a comparison on the r20 or even r15 scores compared to older cores. r23 is a bit of an outlier for alder lake.
 
Jul 27, 2020
16,328
10,339
106
I still think that adding HT to Gracemont would provide a huge boost in performance for the small amount of additional internal structures required. It seems like there is some low hanging fruit there for Intel should they ultimately choose to pick it.
I wish Ian would get some Intel engineer on the record as to the exact reason why HT was not considered for Gracemont cores. My guess is that they have prototype silicon with HT but increased power usage and greater than usual performance regression in case of HT-averse use cases are holding them back and their engineers are trying to refine it.
 

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
Do you have any results of the whole ecore cluster? From my observation of current reviews/benchmarks it seems that's where appears to suffer. The core alone is good but when all four of them are fully loaded the mt performance drops well below it's potential.
Also it might be worth doing a comparison on the r20 or even r15 scores compared to older cores. r23 is a bit of an outlier for alder lake.

The results I posted are from testing the entire cluster and then dividing by 4 to isolate a single core. As for testing R15/R20... I think I'm "Cinebenched" out for a while;)
 
  • Haha
Reactions: igor_kavinski

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
I wish Ian would get some Intel engineer on the record as to the exact reason why HT was not considered for Gracemont cores. My guess is that they have prototype silicon with HT but increased power usage and greater than usual performance regression in case of HT-averse use cases are holding them back and their engineers are trying to refine it.

IntelUser2000 gave me a reason in either this or the big "Lakes" thread a while back. I don't remember it exactly what he wrote but I think he was referring things that were above my pay grade/understanding. I concede his understanding of these matters is a good order of magnitude higher than mine so I didn't push.

Could you imagine 8+16 with the +16 being HT enabled Gracemonts? That would be like tacking a 3950 onto the 8 Golden Coves. Now that is what I call having enough compute for background tasks!

But yes I would like to know more. Using my (very) limited knowledge it seems as though HT is especially effective when you have a wide architecture, like Gracemont. All of those unused physical resources go into creating two logical processors out of one actual one. If I had to guess I'd say Intel knew what they had with Golden Cove, meaning it was going to be a competitor. They probably came to the following conclusions.

1. We have enough on our plate with the hybrid design, two highly revised cores, and a new node.
2. We cannot make the die any larger. Period. Hard stop.
3. HT on the Gracemont cores is an ace we'll keep up our sleeve for now in a similar way Conroe didn't have HT but Nehalem did.

Of course all of that is pure conjecture on my part!
 
  • Like
Reactions: igor_kavinski

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
I can speculate that one of the biggest roadblocks to HT being provisioned for the 'Mont cores is the "smallish" shared L2 between all four cores. If we were to evenly divide that between 8 threads, we would be back to 256k of L2 per thread, but realize that all four cores would be putting even more read/write pressure on it than they do now, and the MT performance hit on it would likely make it worse than it is now with just four threads.
 

CakeMonster

Golden Member
Nov 22, 2012
1,392
498
136
HT on the small cores would be dead last in thread priority though (P > E > P-HT > E-HT). My knowledge of architecture is very limited, but just from that I would imagine it would probably have a very limited impact except for heavily multithreaded scenarios which are pretty rare in day to day usage for now. Maybe it would nudge a few % more out of encoding scenarios, but for gaming or similar I can't imagine it would make a difference since Intel provides more than enough threads for now.