Discussion Intel current and future Lakes & Rapids thread

Page 292 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,628
5,934
146

Rocket Lake launches in March 2021. Major bummer.

On the bright side, memory overclocking should be a thing on B560. Finally.
 

mikk

Diamond Member
May 15, 2012
4,140
2,154
136

8+8 Alder Lake-S ES spotted. The Result ID says that it has 32T, but SiSoft does seem to read it correctly as 24T in the number of threads.

Seems that a Small Core cluster gets 1.25 MB L2 and the whole chip has 30 MB L3.


Once again it's an ADL-S based leak and this time a fully enabled 8+8 part. So it confirms the 24 threads and 30MB L3 story from Notebookcheck and Golden Cove seems to get the cache size from Tigerlake, 3MB L3 per core and 1.25 MB L2 per core. I have scrolled through the Processor Arithmetic scores and I think this is a very good score if it's running with 1.4 Ghz only.

edit: if there is no L3 for Gracemont it would result in 3.75 MB L3 per core for Golden Cove. We don't know if Gracemont gets a separate L3.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Alder Lake is obviously a mobile-first design.

Yeah, designed to beat AMD 8C mobile chips in cinebenches that are? wait for it, irrelevant for mobile designs, 8 fat cores are plenty there if not 6 or 4.

I think what is going on, is that Intel will take 8+8C design and in say 25W budget it will run threaded low at amazing speed on big cores for some time, until TDP is "exhausted" as configured by vendor with tau etc timers.
Once TDP is constrained and load is still high, chip will somehow start indicating to OS that it is "constrained" to force it to offload to small cores, basically trying to get most performance from given TDP budget.

It could work, but would create amazingly inconsistent experience even before we add GPU power budgets. All that to beat AMD in benchmarks?

edit: if there is no L3 for Gracemont it would result in 3.75 MB L3 per core for Golden Cove. We don't know if Gracemont gets a separate L3.

I think it is completely irrelevant to performance in post inclusive-L3 world. Does not matter if there are 8 L3 slices together with big cores, 3.75MB each or 10 slices, where each is 3MB attached to 8 big and 2 small clusters.
Same sized cache, same number of stops on ring in each variant. Its not like small cluster will have "private" L3 if Intel knows what it is doing.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,063
3,411
126
Anyone else feel that Alder Lake is completely wrong product, that bets performance on Windows process scheduler with barely anything to show for in return? It is like AMD ZEN CCX hell, except scheduler needs time machine to know in advance characteristics of the load. And if it gets scheduled on small cores when in fact it needs performance, you pay insane cache miss price for migrating it to perf cores.
Scheduler does not really need a time machine. Every single thread that is created is given a priority by the programmer. If it is something that needs to be on a fast core, you give it a higher priority. If it can run just fine on a slow core, then you give it a low priority. This can and likely will be given explicit options to have the programmer state what type of core to run on as well.


FIELDS
AboveNormal3The Thread can be scheduled after threads with Highest priority and before those with Normal priority.
BelowNormal1The Thread can be scheduled after threads with Normal priority and before those with Lowest priority.
Highest4The Thread can be scheduled before threads with any other priority.
Lowest0The Thread can be scheduled after threads with any other priority.
Normal2The Thread can be scheduled after threads with AboveNormal priority and before those with BelowNormal priority. Threads have Normal priority by default.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Scheduler does not really need a time machine. Every single thread that is created is given a priority by the programmer. If it is something that needs to be on a fast core, you give it a higher priority. If it can run just fine on a slow core, then you give it a low priority. This can and likely will be given explicit options to have the programmer state what type of core to run on as well.

What about existing programs, most of which create threads without specific priority ( normal ). They will work bad in a world where priority needs to be specified.

And on "not needing time machine" ? Scheduling even without heterogenous core setup is recognized as being very hard problem. Just giving minimal thought to workload being characterized at any moment wrongly for core type it is being run at, requires better characterization and that characterization requires input from the future, when behavior of workload is already known.
 
  • Like
Reactions: spursindonesia

jur

Junior Member
Nov 23, 2016
17
4
81
Once TDP is constrained and load is still high, chip will somehow start indicating to OS that it is "constrained" to force it to offload to small cores, basically trying to get most performance from given TDP budget.

Isn't this the goal of such a chip? To get the most performance from the given tdp? Laptops are not meant to be run at 100% load for long periods anyway, unless you like running your laptop at 90 degrees. The purpose of the small cores is to increase MT performance and keep reasonable tdp.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Isn't this the goal of such a chip? To get the most performance from the given tdp? Laptops are not meant to be run at 100% load for long periods anyway, unless you like running your laptop at 90 degrees. The purpose of the small cores is to increase MT performance and keep reasonable tdp.

No doubt it is, initial thread was about ADL-S chip for desktop. On desktop such chips make way less sense.
 

dullard

Elite Member
May 21, 2001
25,063
3,411
126
What about existing programs, most of which create threads without specific priority ( normal ). They will work bad in a world where priority needs to be specified.

And on "not needing time machine" ? Scheduling even without heterogenous core setup is recognized as being very hard problem. Just giving minimal thought to workload being characterized at any moment wrongly for core type it is being run at, requires better characterization and that characterization requires input from the future, when behavior of workload is already known.
Priority of threads has been around for nearly two decades. I'm not particularly worried about the performance of software older than that (say a program from the 1990s) running on new processors (purchased in the 2020s). There is also the setting for background vs foreground. Even if a thread was given normal priority, it might still be assigned to be a background thread. A good programmer usually puts the GUI threads in foreground for high responsiveness and most of the other threads in the background. The exception is for tasks that are known to need many resources (such as a program intended to do heavy calculations).

You are way overblowing the issue. Many users want low power and high responsiveness. Big/little is perfect for that task. High priority and foreground threads that are usually user-facing get the big cores. The rest get the little cores. Near instant responsiveness with as low power as needed. The only real problem is when the user is performing two or more intensive programs at the same time. For example: number crunching while mining bitcoin. Yes that scheduling would be difficult for the operating system to know which is more important. But that type of use case just isn't a key concern for a large swath of users.

My personal concern lies more in the idea that I doubt that 2/2, 4/4, or 8/8 is the right combination of big/little cores. Once this gets established, I think we'll see demand for non-equal numbers. I could certainly see great use cases for 2 big and say 16 little cores. Or 16 big cores and 2 little cores. 8/8 just seems like the easy-to-make route, and not the ideal-for-users route.
 

ondma

Platinum Member
Mar 18, 2018
2,721
1,281
136
Priority of threads has been around for nearly two decades. I'm not particularly worried about the performance of software older than that (say a program from the 1990s) running on new processors (purchased in the 2020s). There is also the setting for background vs foreground. Even if a thread was given normal priority, it might still be assigned to be a background thread. A good programmer usually puts the GUI threads in foreground for high responsiveness and most of the other threads in the background. The exception is for tasks that are known to need many resources (such as a program intended to do heavy calculations).

You are way overblowing the issue. Many users want low power and high responsiveness. Big/little is perfect for that task. High priority and foreground threads that are usually user-facing get the big cores. The rest get the little cores. Near instant responsiveness with as low power as needed. The only real problem is when the user is performing two or more intensive programs at the same time. For example: number crunching while mining bitcoin. Yes that scheduling would be difficult for the operating system to know which is more important. But that type of use case just isn't a key concern for a large swath of users.

My personal concern lies more in the idea that I doubt that 2/2, 4/4, or 8/8 is the right combination of big/little cores. Once this gets established, I think we'll see demand for non-equal numbers. I could certainly see great use cases for 2 big and say 16 little cores. Or 16 big cores and 2 little cores. 8/8 just seems like the easy-to-make route, and not the ideal-for-users route.

If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.
 

mikk

Diamond Member
May 15, 2012
4,140
2,154
136
Comparing Alderlake ES to an i9-9900k(half thread count but nearly 3x frequency)
looks weird, sometimes showing as huge as ~40-50% increase but ~20% regression for others, might be immature result due to being an early ES.

i9-9900k isn't half thread count, it has 16 threads versus 24 threads for ADL-S 8+8. However ADL-S has 16 real cores versus 8 real cores. The Processor Arithmetic scores look promising if it really run with 1.4 Ghz only. There is not even AVX512 involved because it obviously run in hybrid mode.

Compared to 12C/24T 3900x 4.3 Ghz: https://ranker.sisoftware.co.uk/sho...d4ecd8e1d0e7d7f183be8ea8cda895a583f0cdf5&l=en

1.4 Ghz 24T vs 4.3 Ghz 24T
Dhrystone Int 276.18GIPS vs 575.94GIPS
Whetstone Single-float 211.62GFLOPS vs 342.39GFLOPS
Whetstone Double-float 158.04GFLOPS vs 294.49GFLOPS
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.

If it's still using the Ring, it can't be more than 10 Cores+Clusters. The Mesh is inevitable but may not be happening on Alder Lake.

If it's using chiplets, it probably can't have more than one of each. And 8+8 is probably about as big as they want to go.
 

ondma

Platinum Member
Mar 18, 2018
2,721
1,281
136
i9-9900k isn't half thread count, it has 16 threads versus 24 threads for ADL-S 8+8. However ADL-S has 16 real cores versus 8 real cores. The Processor Arithmetic scores look promising if it really run with 1.4 Ghz only. There is not even AVX512 involved because it obviously run in hybrid mode.

Compared to 12C/24T 3900x 4.3 Ghz: https://ranker.sisoftware.co.uk/sho...d4ecd8e1d0e7d7f183be8ea8cda895a583f0cdf5&l=en

1.4 Ghz 24T vs 4.3 Ghz 24T
Dhrystone Int 276.18GIPS vs 575.94GIPS
Whetstone Single-float 211.62GFLOPS vs 342.39GFLOPS
Whetstone Double-float 158.04GFLOPS vs 294.49GFLOPS
To compare real "core" or "thread" performance, though, one should be running at the same clockspeed. It is somewhat problematical to extrapolate from 1.4ghz to 4.3. Even more so in this case, since half the "cores" in ADL will clearly not be able to reach 4.3 ghz.
 

mikk

Diamond Member
May 15, 2012
4,140
2,154
136
Acer Swift 3: https://www.ultrabookreview.com/41607-acer-swift-3-sf313-53-review/

Cooling is better than on the ZenBook S UX393EA, the limiting factor over the Zenbook is this:

The 17W TDP allocation is the limiting factor here, and better performance could be obtained if Acer would allow this i7 to run at higher power on the retail models, which this implementation could definitely handle.


The Swift 5 runs with a higher TDP. Given the low TDP the gaming results are quite good in comparison:

We’re looking at 2x or higher improvement over the i7-1065G7 Swift 3 configuration tested earlier this year, even on this 17W implementation which does not allow the Iris Xe graphics to run at its full potential. In fact, it’s fairly far from it, with the GPU averaging frequencies of only .9 to 1 GHz in our gaming tests, down from the 1.35 GHz peak performance that the platform is theoretically capable of.
 
  • Like
Reactions: Tlh97 and clemsyn

mikk

Diamond Member
May 15, 2012
4,140
2,154
136
since half the "cores" in ADL will clearly not be able to reach 4.3 ghz.

And why? Tremont runs on the redacted Icelake 10+ process and can go up to 3.0 Ghz in a 10W low power product, this product isn't made for the highest possible performance. Gracemont is the first big deskop implementation and should be manufactured with 10ESF.


Profanity is not allowed in the tech forums.

AT Mod Usandthem
 
Last edited by a moderator:

dullard

Elite Member
May 21, 2001
25,063
3,411
126
If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.
These chips currently are going for laptops. So your point is already the route being pursued. But this is just the very first step.

For desktops, if I understand correctly, the goal is eventually to have a mix-and-match concept. Different user subsets might need lower cost, more graphics, more raw calculation capability, more cores, more responsiveness, AI or no AI, physics or no physics, etc. With a given amount of resources, this would logically need different numbers of little chips, big chips, and GPU execution units.
  • You might want as many big cores as possible, no GPU execution units, and only a couple little cores to handle background tasks like virus scanning, email, voice command listening, etc. This might get you a great number cruncher.
  • I might want a low power chip with few big cores, many little cores, and more GPU execution units. This might give me a great HTPC.
  • Someone else might want a CPU that is as low power as possible while running legacy x86 code (pretty much all of manufacturing in the world). That would be a lot of little cores.
  • Someone else might want a great gaming CPU which would be a roughly equal number of big/little cores and as many GPU execution units as possible.
Basically, the long term desktop concept is to just plop in as many big, little, GPU, AI, etc cores in as your budget can support. Whether that is a power budget, a dollar budget, a space budget, or something in between would be up to your particular needs. The end-goal is no more "one-size-fits-all" CPUs where the only real choice you have is how much power vs how many cores.
 
  • Like
Reactions: Zucker2k

ondma

Platinum Member
Mar 18, 2018
2,721
1,281
136
These chips currently are going for laptops. So your point is already the route being pursued. But this is just the very first step.

For desktops, if I understand correctly, the goal is eventually to have a mix-and-match concept. Different user subsets might need lower cost, more graphics, more raw calculation capability, more cores, more responsiveness, AI or no AI, physics or no physics, etc. With a given amount of resources, this would logically need different numbers of little chips, big chips, and GPU execution units.
  • You might want as many big cores as possible, no GPU execution units, and only a couple little cores to handle background tasks like virus scanning, email, voice command listening, etc. This might get you a great number cruncher.
  • I might want a low power chip with few big cores, many little cores, and more GPU execution units. This might give me a great HTPC.
  • Someone else might want a CPU that is as low power as possible while running legacy x86 code (pretty much all of manufacturing in the world). That would be a lot of little cores.
  • Someone else might want a great gaming CPU which would be a roughly equal number of big/little cores and as many GPU execution units as possible.
Basically, the long term desktop concept is to just plop in as many big, little, GPU, AI, etc cores in as your budget can support. Whether that is a power budget, a dollar budget, a space budget, or something in between would be up to your particular needs. The end-goal is no more "one-size-fits-all" CPUs where the only real choice you have is how much power vs how many cores.
That sounds like an interesting, if confusing, line up, which could offer unique performance in certain scenarios. The problem is, they do not have a hybrid chip with more than 8 big cores, and on the desktop, they have to compete with 12 or even 16 big cores from AMD. And by the time ADL comes out they will be competing with an even stronger Zen 3 or Zen 4 core.
 

dullard

Elite Member
May 21, 2001
25,063
3,411
126
That sounds like an interesting, if confusing, line up, which could offer unique performance in certain scenarios. The problem is, they do not have a hybrid chip with more than 8 big cores, and on the desktop, they have to compete with 12 or even 16 big cores from AMD. And by the time ADL comes out they will be competing with an even stronger Zen 3 or Zen 4 core.
It sounds like your timeline is on the order of a year. Intel's timeline is on the order of a decade.
 
  • Love
Reactions: spursindonesia

coercitiv

Diamond Member
Jan 24, 2014
6,199
11,895
136
ADL-S doesn't have a core count problem, just watch as the media falls in love with the new 8-core Ryzen this month.

Intel's problem is their Server / HEDT line which is more or less MIA. Fix that instead and the consumer mainstream platform can stay at 8 cores until Intel finds something better than the ring interconnect.
 

uzzi38

Platinum Member
Oct 16, 2019
2,628
5,934
146
Hey guys, maybe an odd question, but when did Intel start all the Tiger Lake teasing bs several months ago now? I forgot the actual first date they starting doing that stuff
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
Intel's problem is their Server / HEDT line which is more or less MIA. Fix that instead and the consumer mainstream platform can stay at 8 cores until Intel finds something better than the ring interconnect.

Because of bad 10 nm yields, Tiger Lake is probably more profitable. Chiplets are going to be a necessity to salvage 10 nm. Until Intel delivers a server design that uses that it's going to be more delays and paper launches.