Question Alder Lake - Official Thread

Hulk · Oct 28, 2021

With the release of Alder Lake less than a week away and the "Lakes" thread having turned into a nightmare to navigate I thought it might be a good time to start a discussion thread solely for Alder Lake.

Hulk · Jan 27, 2022

TheELF said:
Intel sells the ecores in the way shown in these pictures.
Compared to a CPU without ecores it allows your foreground app to keep running at full speed while giving the background app a specified amount of compute that's always available to it, depending on the amount of e-cores.

The way you would keep your foreground app running at full speed on a CPU without e-cores would be to run the foreground app at real-time priority and the background app at idle priority meaning that the background app might never get any CPU cycles.

This is NOT mitigating potential responsiveness problems caused by the presence of E-cores, this is your foreground app getting the use of the full CPU (compared to a CPU without e-cores) meaning there is no slowdowns.

These slides are interesting and as usual it's a rabbit hole when you start to analyze them because so much of the testing procedures is left out. Specifically the length of the test scripts.

Let's look at the parallel processing example in the 2nd slide. Let's also assume that the video encoding task was selected so that 8 E cores will finish it more or less when the 8 P cores finish the two serial tasks. This of course is a best case scenario for the TH and requires no "smarts" at all. Foreground on the P's, background on the E's. Perfect for the current behavior of Alder Lake on Windows 11. The graph shows 47% faster than Rocket Lake, which is expect as ADL has more stronger big cores than Rocket, and 8 additional E cores. No surprise there.

It's also pretty easy to understand why when the tasks are completed in parallel Rocket is a little closer to ADL. These apps aren't utilizing the E's fully so most of the gains are probably coming from the P's and stronger memory subsystem. ie the apps are not highly multithreaded to effectively use 8+ cores.

Now imagine that the video encoding portion of the parallel processing task was twice as long. Under current ADL behavior after finish the foreground task if the user didn't switch to the background task the P's would remain idle until the E's finished. There is a video encoding test length where Rocket would actually win because this eventually comes down to 8 E's vs 8 Cypress Lake cores.

Another bad scenario for ADL is if in the first serial processing test after starting the tasks the user puts a web browser in the foreground. All of these tasks now go to the E's and Rocket Lake wins.

This would also happen in the parallel processing example if a web browser was the foreground app after starting the workload. Rocket would win against the 8 E's as the P's sat idle, or nearly so as I have seen on my computer when doing these same tasks.

Another bad situation for ADL/Intel is having three tasks running simultaneously. Now the P's could be "computing" chrome while the E's are compressing video and processing RAW photo files. A very inefficient use of compute that I run into all the time.

Is ADL superior to RKL? Yes, of course. But there is some user intervention necessary to extract that superiority. Intel kind of manipulated this test, it looks that way to me anyway, in order to put ADL in a better light than RKL.

The fix seems so simple I must be wrong in my thinking. Simply keep the P's assigned to the foreground task but ALSO allow them to work on background tasks while keeping the foreground task as the priority. So if you are editing a photo and need the P's for 5 seconds to process a filter on a photo they would immediately move to the foreground app, process the filter, and then move back to the background apps until the foreground app needed them again.

As I've written before I think this issue even without the adjustment I mentioned above is mitigate greatly with Raptor Lake and 16 E cores. That is basically a 3950X working on background tasks. I could live (happily) with that. But with the 12700K and only one Gracemont cluster this behavior is silly.

DrMrLordX · Jan 27, 2022

JoeRambo said:
The wide variety of tests ensures that everyone can look up what matters to them

Yes, but how many of the people simply pasting the geometric mean are actually doing that, or encouraging anyone else to do that?

uzzi38 said:
Phoronix have their test suite and that's it, whatever takeaways you make from that test suite should factor in the tests in that test suite. That's it.

Exactly, though the geometric mean posted at the end of the test suite is pretty useless unless your workload mimics the test suite almost exactly.

Heartbreaker · Jan 27, 2022

DrMrLordX said:
Exactly, though the geometric mean posted at the end of the test suite is pretty useless unless your workload mimics the test suite almost exactly.

Pretty much the same with all CPU review benchmark suites. The average is kind of useless to anyone, because no one matches their use case to a suite of benchmarks.

Which is why I only look at the things I actually do that load the CPU significantly. Which is Gaming and Video Encoding (very distant second, used to be higher when I recorded/encoded a lot of OTA TV).

The other home computing stuff (Web, Office Suite, Media consumption), doesn't even really tax a 10 year old 4 thread CPU.

TheELF · Jan 27, 2022

Hulk said:
These slides are interesting and as usual it's a rabbit hole when you start to analyze them because so much of the testing procedures is left out. Specifically the length of the test scripts.

You have to realize that this is not being done for best performance or for best efficiency.
This is purely a Apple thing of providing the best user experience, it's about the noob streamer not losing FPS or dropping frames on the recording without having to do anything special, and it's about the noob content creator not having to wait any longer importing and exporting pics while converting their video.
Not feeling your system ever being bogged down is far more important than actual performance and if actual performance is still pretty high it's a win win.

Hulk said:
The fix seems so simple I must be wrong in my thinking. Simply keep the P's assigned to the foreground task but ALSO allow them to work on background tasks while keeping the foreground task as the priority. So if you are editing a photo and need the P's for 5 seconds to process a filter on a photo they would immediately move to the foreground app, process the filter, and then move back to the background apps until the foreground app needed them again.

I'm sure it doesn't work well all the time yet, but the theory is that TD will actively allow all cores to work on a task or boot a low importance thread from a P core to run something new that is more important.
So basically what you are saying and more, I just have the suspicion that all apps and threads have to be running on the same priority for this to work correctly but I might be wrong.

Here is the difference between the previous pics which were targeted to noobs and what TD actually is supposed to do.

Hulk · Jan 27, 2022

TheELF said:
You have to realize that this is not being done for best performance or for best efficiency.
This is purely a Apple thing of providing the best user experience, it's about the noob streamer not losing FPS or dropping frames on the recording without having to do anything special, and it's about the noob content creator not having to wait any longer importing and exporting pics while converting their video.
Not feeling your system ever being bogged down is far more important than actual performance and if actual performance is still pretty high it's a win win.

I'm sure it doesn't work well all the time yet, but the theory is that TD will actively allow all cores to work on a task or boot a low importance thread from a P core to run something new that is more important.
So basically what you are saying and more, I just have the suspicion that all apps and threads have to be running on the same priority for this to work correctly but I might be wrong.

Here is the difference between the previous pics which were targeted to noobs and what TD actually is supposed to do.

I remember watching that video when it came out and being excited to see it in action. Unfortunately in practice it's not working the way she explains it. P cores will "spin" and do nothing while you browse the web even if the E cores are compressing video in Handbrake in the background.

As far as I can tell NONE of that stuff they are talking about with the TD is happening. I'm going to have a little fun with this now. It's more like, "We at Intel have decided to put the foreground application on the P cores because we need the highest possible benchmark scores. We then cleverly shove all of the background tasks on the E cores. Kind of like sweeping dirt under the rug but in a modern way. See, we're Intel and we make old new again!"

The most ironic part of this is that the time and money spent on that video could probably have been used to actually fix the thread director instead of advertising it.

dullard · Jan 27, 2022

Hulk said:
I remember watching that video when it came out and being excited to see it in action. Unfortunately in practice it's not working the way she explains it. P cores will "spin" and do nothing while you browse the web even if the E cores are compressing video in Handbrake in the background.

As far as I can tell NONE of that stuff they are talking about with the TD is happening. I'm going to have a little fun with this now. It's more like, "We at Intel have decided to put the foreground application on the P cores because we need the highest possible benchmark scores. We then cleverly shove all of the background tasks on the E cores. Kind of like sweeping dirt under the rug but in a modern way. See, we're Intel and we make old new again!"

The most ironic part of this is that the time and money spent on that video could probably have been used to actually fix the thread director instead of advertising it.

No matter how much work Intel puts into it, it won't be enough. That is because Windows can just override the Thread Director at its whimsy. The right solution is the a very long-term solution: recompile the software to properly place code on the right cores. By the time that happens, Intel would have gotten away from possibly the worst possible combination that you have (8 P and 4 E cores while trying to do something in both the background and the foreground).

Hulk · Jan 27, 2022

dullard said:
No matter how much work Intel puts into it, it won't be enough. That is because Windows can just override the Thread Director at its whimsy. The right solution is the a very long-term solution: recompile the software to properly place code on the right cores. By the time that happens, Intel would have gotten away from possibly the worst possible combination that you have (8 P and 4 E cores while trying to do something in both the background and the foreground).

Your prediction is grim and unfortunately most likely correct.

TheELF · Jan 27, 2022

Hulk said:
As far as I can tell NONE of that stuff they are talking about with the TD is happening. I'm going to have a little fun with this now. It's more like, "We at Intel have decided to put the foreground application on the P cores because we need the highest possible benchmark scores. We then cleverly shove all of the background tasks on the E cores. Kind of like sweeping dirt under the rug but in a modern way. See, we're Intel and we make old new again!"

But that would mean that TD director is working at least in this workflow of running a heavy workload together with background stuff.

Hulk · Jan 28, 2022

TheELF said:
But that would mean that TD director is working at least in this workflow of running a heavy workload together with background stuff.

I'm not following? Can you clarify?

I think the TD is working within an application but not among multiple workloads. Meaning if one application is running the TD seems to get it right in terms of assigning P's and E's to get the most performance. They had to get this right or the launch would have been disastrous from a benchmarking point of view.

The current logic starts to fail with multiple applications are running. The current logic is P's on foreground, E's on background when it should be that with the condition that when foreground is idle P's move to background to utilize all compute.

Saylick · Jan 28, 2022

https://chipsandcheese.com/2022/01/28/alder-lakes-power-efficiency-a-complicated-picture/

A bunch of cool plots. Really great analysis overall.

In summary:

Out of the box, the 12700K prioritizes absolute performance over power efficiency. “Race to sleep” is complete bullshit, at least until you get down to very low power levels.

Golden Cove is very efficient below 4 GHz, especially with a vectorized workload

Even though it’s paired with E-Cores, Golden Cove still scales well to very low power levels.

Gracemont is very efficient with integer workloads in the low 3 GHz range.

256-bit instructions give Gracemont a hard time. With libx264, it needs to go below 3 GHz before it really shines in terms of energy efficiency

When run at sane clocks, both Alder Lake architectures show significant efficiency gains compared to Skylake

In terms of energy efficiency at similar clocks, Zen 2 cores are excellent. Golden Cove has to drop below 2 GHz to finish the encode job with the same energy budget as desktop Zen 2. Gracemont can do better, but also has to clock below 2 GHz. Again, we see desktop Zen 2 cores failing to gain efficiency at lower clock speeds. Their energy efficiency peaks when boost is turned off, and going lower actually makes the cores pull more total power. Renoir is much better at scaling down to low power. At least in the near future, AMD can probably get by without maintaining separate E-Core and P-Core architectures. They’re already covering both bases by changing L3 size and optimizing the same architecture for different power and performance targets.

dullard · Jan 28, 2022

Saylick said:
https://chipsandcheese.com/2022/01/28/alder-lakes-power-efficiency-a-complicated-picture/

A bunch of cool plots. Really great analysis overall.

Chips And Cheese always goes into a lot of detail with tons of graphs. If you haven't browsed that website, the analyses are really well done. But the website itself is a nightmare to navigate or find anything. Thanks for this link.

DrMrLordX · Jan 30, 2022

guidryp said:
Pretty much the same with all CPU review benchmark suites.

Lots of them don't post an average.

sierpp · Jan 31, 2022

Saylick said:
https://chipsandcheese.com/2022/01/28/alder-lakes-power-efficiency-a-complicated-picture/

A bunch of cool plots. Really great analysis overall.

Something is fishy in this article. Looks like sponsored one.
They compare new and shiny Intel architecture to almost 3 years old AMD Zen 2. Time isn't standing still and AMD won't wait for intel to catch up.
I bet that comparison with Zen 3 doesn't look so good from efficiency perspective. At least they didn't include FX 9590 in the graphs ;-)

uzzi38 · Feb 1, 2022

sierpp said:
Something is fishy in this article. Looks like sponsored one.
They compare new and shiny Intel architecture to almost 3 years old AMD Zen 2. Time isn't standing still and AMD won't wait for intel to catch up.
I bet that comparison with Zen 3 doesn't look so good from efficiency perspective. At least they didn't include FX 9590 in the graphs ;-)

Huh? The comparisons to Zen 2 are already extremely favourable, idk what on earth you're talking about. Especially when Renoir gets added into the mix there, the power efficiency advantage is clear.

The only reason why Zen 3 wasn't tested is that it was easier for Clam to test on Zen 2. That's it.

igor_kavinski · Feb 1, 2022

uzzi38 said:
The only reason why Zen 3 wasn't tested is that it was easier for Clam to test on Zen 2. That's it.

Sorry for my ignorance but easier how? He didn't have Zen 3 on hand or is there some other reason?

uzzi38 · Feb 1, 2022

igor_kavinski said:
Sorry for my ignorance but easier how? He didn't have Zen 3 on hand or is there some other reason?

Yeah I don't think Clam does have one on hand. I know Cheese has a 5950X, but I don't think Clam does - I'm pretty sure the 3950X used for the review is Clam's desktop.

igor_kavinski · Feb 1, 2022

Redfire on Twitter: "High resolution Alder Lake-S Die Shot: I believe this was sent to reviewers and it was on embargo till today. I'm not 100% sure. Resolution: 6950x3514 Link: https://t.co/7SKmi7f94F https://t.co/FTp3qvVjgI" / Twitter

Hope someone will label the different areas in the high resolution die shot.

nicalandia · Feb 1, 2022

igor_kavinski said:
Redfire on Twitter: "High resolution Alder Lake-S Die Shot: I believe this was sent to reviewers and it was on embargo till today. I'm not 100% sure. Resolution: 6950x3514 Link: https://t.co/7SKmi7f94F https://t.co/FTp3qvVjgI" / Twitter

Hope someone will label the different areas in the high resolution die shot.

You must post commentary to all links and images.

esquared
Anandtech Forum Director

igor_kavinski · Feb 1, 2022

MSI partially reenables Alder Lake-S AVX-512 support for MEG Z690 Unify-X motherboard - VideoCardz.com

MSI reintroduced AVX-512 microcode support for its Z690 Unify-X motherboard MSI Z690 motherboard’s latest A22 BIOS comes with a microcode selector offering either its latest version or an older one with AVX-512 instruction support. Intel’s last-minute change to disable AVX-512 instruction has...

videocardz.com

Rise. R.I.S.E! RISE! It's ALIVEEEE!!!! AVX-512 LIVES!

nicalandia · Feb 2, 2022

Alder Lake P Die anotation(i7 H Laptop)

Alder Lake S Die Anotation(Desktop i7)

semiman · Feb 3, 2022

nicalandia said:
Alder Lake P Die anotation(i7 H Laptop)

Alder Lake S Die Anotation(Desktop i9)

I'm quite curious how Raptor Lake's Gracemont will be placed. There's a large void area between Ring Agent and GPU due to Gracemont cluster's width. I think that area inefficiency should be fixed somehow to make 16 GM cores(4 clusters) efficient. Intel managed to put something there at least for an ADL-P though.

Heartbreaker · Feb 3, 2022

HWUB i3-12100 review:

Geegeeoh · Feb 3, 2022

Nice but I can only find ~150€ B660 MoBos...

dullard · Feb 3, 2022

Geegeeoh said:
Nice but I can only find ~150€ B660 MoBos...

You didn't mention your country.
€92.87 https://www.mindfactory.de/product_...S2H-DDR4--H510-S1700-mATX-Intel-_1442165.html
€94.87 https://www.mindfactory.de/product_info.php/Biostar-H610MH--H610-S1700-mATX-Intel-_1442856.html
€101.29 https://www.mindfactory.de/product_...00-mATX-Intel-H610-2xDDR4-retail_1441766.html
Or Amazon has them too: €108.42 https://www.amazon.de/-/en/Gigabyte...byte+h610+m&qid=1643902204&s=computers&sr=1-6

If you require B660: €108.04 https://www.mindfactory.de/product_...HDV-DDR4-Intel-S1700-MATX-retail_1440339.html

Geegeeoh · Feb 3, 2022

But I did way B660, not H610M or B660M.

Italy btw.

Question Alder Lake - Official Thread

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Lifer

Junior Member

Platinum Member

Lifer

Platinum Member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Member

Elite Member

Member