Discussion Intel current and future Lakes & Rapids thread

uzzi38 · Oct 6, 2020

Intel 11th Gen Core Rocket Lake-S series expected in 2021 - VideoCardz.com

Intel has recently changed the launch window for Rocket Lake to March 2021. Intel Rocket Lake-S now expected in March 2021 Intel Rocket Lake-S is (hopefully) the last consumer desktop 14nm Core processors series due to launch December or January 2021. We have heard a lot of rumors that the...

videocardz.com

Rocket Lake launches in March 2021. Major bummer.

On the bright side, memory overclocking should be a thing on B560. Finally.

mikk · Oct 6, 2020

jpiniero said:
Details for Computer/Device Intel Alder Lake Client Platform Alder Lake Client System (Intel AlderLake-S ADP-S DRR4 CRB)

ranker.sisoftware.co.uk

8+8 Alder Lake-S ES spotted. The Result ID says that it has 32T, but SiSoft does seem to read it correctly as 24T in the number of threads.

Seems that a Small Core cluster gets 1.25 MB L2 and the whole chip has 30 MB L3.

Once again it's an ADL-S based leak and this time a fully enabled 8+8 part. So it confirms the 24 threads and 30MB L3 story from Notebookcheck and Golden Cove seems to get the cache size from Tigerlake, 3MB L3 per core and 1.25 MB L2 per core. I have scrolled through the Processor Arithmetic scores and I think this is a very good score if it's running with 1.4 Ghz only.

edit: if there is no L3 for Gracemont it would result in 3.75 MB L3 per core for Golden Cove. We don't know if Gracemont gets a separate L3.

jpiniero · Oct 6, 2020

JoeRambo said:
ARMs LB is for phones, that care about perf/w above all, there are no workstation/server chips with such scheme.

Alder Lake is obviously a mobile-first design.

JoeRambo · Oct 6, 2020

jpiniero said:
Alder Lake is obviously a mobile-first design.

Yeah, designed to beat AMD 8C mobile chips in cinebenches that are? wait for it, irrelevant for mobile designs, 8 fat cores are plenty there if not 6 or 4.

I think what is going on, is that Intel will take 8+8C design and in say 25W budget it will run threaded low at amazing speed on big cores for some time, until TDP is "exhausted" as configured by vendor with tau etc timers.
Once TDP is constrained and load is still high, chip will somehow start indicating to OS that it is "constrained" to force it to offload to small cores, basically trying to get most performance from given TDP budget.

It could work, but would create amazingly inconsistent experience even before we add GPU power budgets. All that to beat AMD in benchmarks?

mikk said:
edit: if there is no L3 for Gracemont it would result in 3.75 MB L3 per core for Golden Cove. We don't know if Gracemont gets a separate L3.

I think it is completely irrelevant to performance in post inclusive-L3 world. Does not matter if there are 8 L3 slices together with big cores, 3.75MB each or 10 slices, where each is 3MB attached to 8 big and 2 small clusters.
Same sized cache, same number of stops on ring in each variant. Its not like small cluster will have "private" L3 if Intel knows what it is doing.

dullard · Oct 6, 2020

JoeRambo said:
Anyone else feel that Alder Lake is completely wrong product, that bets performance on Windows process scheduler with barely anything to show for in return? It is like AMD ZEN CCX hell, except scheduler needs time machine to know in advance characteristics of the load. And if it gets scheduled on small cores when in fact it needs performance, you pay insane cache miss price for migrating it to perf cores.

Scheduler does not really need a time machine. Every single thread that is created is given a priority by the programmer. If it is something that needs to be on a fast core, you give it a higher priority. If it can run just fine on a slow core, then you give it a low priority. This can and likely will be given explicit options to have the programmer state what type of core to run on as well.

ThreadPriority Enum (System.Threading)

Specifies the scheduling priority of a Thread.

docs.microsoft.com

FIELDS

AboveNormal	3	The Thread can be scheduled after threads with Highest priority and before those with Normal priority.
BelowNormal	1	The Thread can be scheduled after threads with Normal priority and before those with Lowest priority.
Highest	4	The Thread can be scheduled before threads with any other priority.
Lowest	0	The Thread can be scheduled after threads with any other priority.
Normal	2	The Thread can be scheduled after threads with AboveNormal priority and before those with BelowNormal priority. Threads have Normal priority by default.

JoeRambo · Oct 6, 2020

dullard said:
Scheduler does not really need a time machine. Every single thread that is created is given a priority by the programmer. If it is something that needs to be on a fast core, you give it a higher priority. If it can run just fine on a slow core, then you give it a low priority. This can and likely will be given explicit options to have the programmer state what type of core to run on as well.

What about existing programs, most of which create threads without specific priority ( normal ). They will work bad in a world where priority needs to be specified.

And on "not needing time machine" ? Scheduling even without heterogenous core setup is recognized as being very hard problem. Just giving minimal thought to workload being characterized at any moment wrongly for core type it is being run at, requires better characterization and that characterization requires input from the future, when behavior of workload is already known.

RTX2080 · Oct 6, 2020

Comparing Alderlake ES to an i9-9900k(half thread count but nearly 3x frequency)

Details for Component Intel Core i9-9900K

ranker.sisoftware.co.uk

looks weird, sometimes showing as huge as ~40-50% increase but ~20% regression for others, might be immature result due to being an early ES.

jur · Oct 6, 2020

JoeRambo said:
Once TDP is constrained and load is still high, chip will somehow start indicating to OS that it is "constrained" to force it to offload to small cores, basically trying to get most performance from given TDP budget.

Isn't this the goal of such a chip? To get the most performance from the given tdp? Laptops are not meant to be run at 100% load for long periods anyway, unless you like running your laptop at 90 degrees. The purpose of the small cores is to increase MT performance and keep reasonable tdp.

JoeRambo · Oct 6, 2020

jur said:
Isn't this the goal of such a chip? To get the most performance from the given tdp? Laptops are not meant to be run at 100% load for long periods anyway, unless you like running your laptop at 90 degrees. The purpose of the small cores is to increase MT performance and keep reasonable tdp.

No doubt it is, initial thread was about ADL-S chip for desktop. On desktop such chips make way less sense.

dullard · Oct 6, 2020

JoeRambo said:
What about existing programs, most of which create threads without specific priority ( normal ). They will work bad in a world where priority needs to be specified.

And on "not needing time machine" ? Scheduling even without heterogenous core setup is recognized as being very hard problem. Just giving minimal thought to workload being characterized at any moment wrongly for core type it is being run at, requires better characterization and that characterization requires input from the future, when behavior of workload is already known.

Priority of threads has been around for nearly two decades. I'm not particularly worried about the performance of software older than that (say a program from the 1990s) running on new processors (purchased in the 2020s). There is also the setting for background vs foreground. Even if a thread was given normal priority, it might still be assigned to be a background thread. A good programmer usually puts the GUI threads in foreground for high responsiveness and most of the other threads in the background. The exception is for tasks that are known to need many resources (such as a program intended to do heavy calculations).

Thread.IsBackground Property (System.Threading)

Gets or sets a value indicating whether or not a thread is a background thread.

docs.microsoft.com

You are way overblowing the issue. Many users want low power and high responsiveness. Big/little is perfect for that task. High priority and foreground threads that are usually user-facing get the big cores. The rest get the little cores. Near instant responsiveness with as low power as needed. The only real problem is when the user is performing two or more intensive programs at the same time. For example: number crunching while mining bitcoin. Yes that scheduling would be difficult for the operating system to know which is more important. But that type of use case just isn't a key concern for a large swath of users.

My personal concern lies more in the idea that I doubt that 2/2, 4/4, or 8/8 is the right combination of big/little cores. Once this gets established, I think we'll see demand for non-equal numbers. I could certainly see great use cases for 2 big and say 16 little cores. Or 16 big cores and 2 little cores. 8/8 just seems like the easy-to-make route, and not the ideal-for-users route.

ondma · Oct 6, 2020

dullard said:
Priority of threads has been around for nearly two decades. I'm not particularly worried about the performance of software older than that (say a program from the 1990s) running on new processors (purchased in the 2020s). There is also the setting for background vs foreground. Even if a thread was given normal priority, it might still be assigned to be a background thread. A good programmer usually puts the GUI threads in foreground for high responsiveness and most of the other threads in the background. The exception is for tasks that are known to need many resources (such as a program intended to do heavy calculations).

Thread.IsBackground Property (System.Threading)

Gets or sets a value indicating whether or not a thread is a background thread.

docs.microsoft.com

You are way overblowing the issue. Many users want low power and high responsiveness. Big/little is perfect for that task. High priority and foreground threads that are usually user-facing get the big cores. The rest get the little cores. Near instant responsiveness with as low power as needed. The only real problem is when the user is performing two or more intensive programs at the same time. For example: number crunching while mining bitcoin. Yes that scheduling would be difficult for the operating system to know which is more important. But that type of use case just isn't a key concern for a large swath of users.

My personal concern lies more in the idea that I doubt that 2/2, 4/4, or 8/8 is the right combination of big/little cores. Once this gets established, I think we'll see demand for non-equal numbers. I could certainly see great use cases for 2 big and say 16 little cores. Or 16 big cores and 2 little cores. 8/8 just seems like the easy-to-make route, and not the ideal-for-users route.

If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.

mikk · Oct 6, 2020

cortexa99 said:
Comparing Alderlake ES to an i9-9900k(half thread count but nearly 3x frequency)

Details for Component Intel Core i9-9900K

ranker.sisoftware.co.uk

looks weird, sometimes showing as huge as ~40-50% increase but ~20% regression for others, might be immature result due to being an early ES.

i9-9900k isn't half thread count, it has 16 threads versus 24 threads for ADL-S 8+8. However ADL-S has 16 real cores versus 8 real cores. The Processor Arithmetic scores look promising if it really run with 1.4 Ghz only. There is not even AVX512 involved because it obviously run in hybrid mode.

Compared to 12C/24T 3900x 4.3 Ghz: https://ranker.sisoftware.co.uk/sho...d4ecd8e1d0e7d7f183be8ea8cda895a583f0cdf5&l=en

1.4 Ghz 24T vs 4.3 Ghz 24T
Dhrystone Int 276.18GIPS vs 575.94GIPS
Whetstone Single-float 211.62GFLOPS vs 342.39GFLOPS
Whetstone Double-float 158.04GFLOPS vs 294.49GFLOPS

jpiniero · Oct 6, 2020

ondma said:
If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.

If it's still using the Ring, it can't be more than 10 Cores+Clusters. The Mesh is inevitable but may not be happening on Alder Lake.

If it's using chiplets, it probably can't have more than one of each. And 8+8 is probably about as big as they want to go.

ondma · Oct 6, 2020

mikk said:
i9-9900k isn't half thread count, it has 16 threads versus 24 threads for ADL-S 8+8. However ADL-S has 16 real cores versus 8 real cores. The Processor Arithmetic scores look promising if it really run with 1.4 Ghz only. There is not even AVX512 involved because it obviously run in hybrid mode.

Compared to 12C/24T 3900x 4.3 Ghz: https://ranker.sisoftware.co.uk/sho...d4ecd8e1d0e7d7f183be8ea8cda895a583f0cdf5&l=en

1.4 Ghz 24T vs 4.3 Ghz 24T
Dhrystone Int 276.18GIPS vs 575.94GIPS
Whetstone Single-float 211.62GFLOPS vs 342.39GFLOPS
Whetstone Double-float 158.04GFLOPS vs 294.49GFLOPS

To compare real "core" or "thread" performance, though, one should be running at the same clockspeed. It is somewhat problematical to extrapolate from 1.4ghz to 4.3. Even more so in this case, since half the "cores" in ADL will clearly not be able to reach 4.3 ghz.

mikk · Oct 6, 2020

Acer Swift 3: https://www.ultrabookreview.com/41607-acer-swift-3-sf313-53-review/

Cooling is better than on the ZenBook S UX393EA, the limiting factor over the Zenbook is this:

The 17W TDP allocation is the limiting factor here, and better performance could be obtained if Acer would allow this i7 to run at higher power on the retail models, which this implementation could definitely handle.

The Swift 5 runs with a higher TDP. Given the low TDP the gaming results are quite good in comparison:

We’re looking at 2x or higher improvement over the i7-1065G7 Swift 3 configuration tested earlier this year, even on this 17W implementation which does not allow the Iris Xe graphics to run at its full potential. In fact, it’s fairly far from it, with the GPU averaging frequencies of only .9 to 1 GHz in our gaming tests, down from the 1.35 GHz peak performance that the platform is theoretically capable of.

mikk · Oct 6, 2020

ondma said:
since half the "cores" in ADL will clearly not be able to reach 4.3 ghz.

And why? Tremont runs on the redacted Icelake 10+ process and can go up to 3.0 Ghz in a 10W low power product, this product isn't made for the highest possible performance. Gracemont is the first big deskop implementation and should be manufactured with 10ESF.

Profanity is not allowed in the tech forums.

AT Mod Usandthem

dullard · Oct 6, 2020

ondma said:
If done properly, it could be great for laptops and other power constrained scenarios. On the desktop, I just dont see the point. Why not just make 16 (or 12 or whatever) big cores, and let the cpu limit the TDP instead of 8+8. Also, there is usually some room for overclocking on the desktop, (TDP be damned, basically), and I would think more big cores would be better for this.

These chips currently are going for laptops. So your point is already the route being pursued. But this is just the very first step.

For desktops, if I understand correctly, the goal is eventually to have a mix-and-match concept. Different user subsets might need lower cost, more graphics, more raw calculation capability, more cores, more responsiveness, AI or no AI, physics or no physics, etc. With a given amount of resources, this would logically need different numbers of little chips, big chips, and GPU execution units.

You might want as many big cores as possible, no GPU execution units, and only a couple little cores to handle background tasks like virus scanning, email, voice command listening, etc. This might get you a great number cruncher.
I might want a low power chip with few big cores, many little cores, and more GPU execution units. This might give me a great HTPC.
Someone else might want a CPU that is as low power as possible while running legacy x86 code (pretty much all of manufacturing in the world). That would be a lot of little cores.
Someone else might want a great gaming CPU which would be a roughly equal number of big/little cores and as many GPU execution units as possible.

Basically, the long term desktop concept is to just plop in as many big, little, GPU, AI, etc cores in as your budget can support. Whether that is a power budget, a dollar budget, a space budget, or something in between would be up to your particular needs. The end-goal is no more "one-size-fits-all" CPUs where the only real choice you have is how much power vs how many cores.

ondma · Oct 6, 2020

dullard said:
These chips currently are going for laptops. So your point is already the route being pursued. But this is just the very first step.

For desktops, if I understand correctly, the goal is eventually to have a mix-and-match concept. Different user subsets might need lower cost, more graphics, more raw calculation capability, more cores, more responsiveness, AI or no AI, physics or no physics, etc. With a given amount of resources, this would logically need different numbers of little chips, big chips, and GPU execution units.

You might want as many big cores as possible, no GPU execution units, and only a couple little cores to handle background tasks like virus scanning, email, voice command listening, etc. This might get you a great number cruncher.

I might want a low power chip with few big cores, many little cores, and more GPU execution units. This might give me a great HTPC.

Someone else might want a CPU that is as low power as possible while running legacy x86 code (pretty much all of manufacturing in the world). That would be a lot of little cores.

Someone else might want a great gaming CPU which would be a roughly equal number of big/little cores and as many GPU execution units as possible.

Basically, the long term desktop concept is to just plop in as many big, little, GPU, AI, etc cores in as your budget can support. Whether that is a power budget, a dollar budget, a space budget, or something in between would be up to your particular needs. The end-goal is no more "one-size-fits-all" CPUs where the only real choice you have is how much power vs how many cores.

That sounds like an interesting, if confusing, line up, which could offer unique performance in certain scenarios. The problem is, they do not have a hybrid chip with more than 8 big cores, and on the desktop, they have to compete with 12 or even 16 big cores from AMD. And by the time ADL comes out they will be competing with an even stronger Zen 3 or Zen 4 core.

dullard · Oct 6, 2020

ondma said:
That sounds like an interesting, if confusing, line up, which could offer unique performance in certain scenarios. The problem is, they do not have a hybrid chip with more than 8 big cores, and on the desktop, they have to compete with 12 or even 16 big cores from AMD. And by the time ADL comes out they will be competing with an even stronger Zen 3 or Zen 4 core.

It sounds like your timeline is on the order of a year. Intel's timeline is on the order of a decade.

coercitiv · Oct 6, 2020

ADL-S doesn't have a core count problem, just watch as the media falls in love with the new 8-core Ryzen this month.

Intel's problem is their Server / HEDT line which is more or less MIA. Fix that instead and the consumer mainstream platform can stay at 8 cores until Intel finds something better than the ring interconnect.

jpiniero · Oct 6, 2020

If Rocket Lake-S isn't shipping until March, it's hard to believe the Alder Lake-S would be released in 2020.

uzzi38 · Oct 6, 2020

Hey guys, maybe an odd question, but when did Intel start all the Tiger Lake teasing bs several months ago now? I forgot the actual first date they starting doing that stuff

jpiniero · Oct 6, 2020

coercitiv said:
Intel's problem is their Server / HEDT line which is more or less MIA. Fix that instead and the consumer mainstream platform can stay at 8 cores until Intel finds something better than the ring interconnect.

Because of bad 10 nm yields, Tiger Lake is probably more profitable. Chiplets are going to be a necessity to salvage 10 nm. Until Intel delivers a server design that uses that it's going to be more delays and paper launches.

coercitiv · Oct 6, 2020

uzzi38 said:
Hey guys, maybe an odd question, but when did Intel start all the Tiger Lake teasing bs several months ago now? I forgot the actual first date they starting doing that stuff

IIRC it started on June 8 with the "Tiger's Roar" promo material sent to the press and June 17 with the BF5 gameplay demo from Ryan Shrout.

uzzi38 · Oct 6, 2020

coercitiv said:
IIRC it started on June 8 with the "Tiger's Roar" promo material sent to the press and June 17 with the BF5 gameplay demo from Ryan Shrout.

Thanks!

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Diamond Member

Lifer

Golden Member

Elite Member

Golden Member

Senior member

Member

Golden Member

Elite Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Lifer

Platinum Member

Lifer

Diamond Member

Platinum Member