Discussion Intel current and future Lakes & Rapids thread

jpiniero · Jun 20, 2020

Carfax83 said:
So help me to understand. What's the point of using the big+little approach on desktops? I can see why mobile devices use it due to greater power efficiency and what not, but not for desktops. I suppose they don't think they can put 16 big cores on a single die perhaps?

Mainstream desktop has been derived from mobile, so whatever mobile gets, mainstream desktop does too.

Intel has abandoned HEDT for the time being and is barely supplying DIY as it is... not sure it would be worth the effort to do a dual CPU chiplet.

DrMrLordX · Jun 20, 2020

@Carfax83

No idea why Intel is using Gracemont cores in Alder Lake-S, but that's the word on the street. It'll be awhile before actual product appears on the market - probably Q4 2021.

jpiniero · Jun 20, 2020

DrMrLordX said:
@Carfax83

No idea why Intel is using Gracemont cores in Alder Lake-S, but that's the word on the street. It'll be awhile before actual product appears on the market - probably Q4 2021.

Pretty easy to explain by S being basically P except in socketed form.

DrMrLordX · Jun 20, 2020

jpiniero said:
Pretty easy to explain by S being basically P except in socketed form.

There is that. But it means they aren't really designing desktop CPUs anymore.

jpiniero · Jun 20, 2020

DrMrLordX said:
There is that. But it means they aren't really designing desktop CPUs anymore.

That's how it's been... S has been H but in socketed form. Except H is gone with Alder Lake and replaced with P.

Exist50 · Jun 20, 2020

jpiniero said:
That's how it's been... S has been H but in socketed form. Except H is gone with Alder Lake and replaced with P.

Not always true. 10c Comet Lake S is not used for mobile, nor have we heard rumors about Rocket Lake S. It's quite plausible that the 8+8 Alder Lake S die is only used for desktop. If the P rumor is correct (15-45W TDP), then 8+8 on 10nm seems too power hungry.

IntelUser2000 · Jun 20, 2020

Carfax83 said:
So help me to understand. What's the point of using the big+little approach on desktops? I can see why mobile devices use it due to greater power efficiency and what not, but not for desktops. I suppose they don't think they can put 16 big cores on a single die perhaps?

No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).

Carfax83 · Jun 20, 2020

IntelUser2000 said:
No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).

That's a great point, and makes sense! I never thought of it like that. One of the biggest reasons why the Apple A series CPUs are lauded from a performance standpoint is because of how wide and big they are, but putting a large amount of those cores on a single die would probably not be possible without some significant reengineering of the CPUs to make them more scalable. So if Intel is now chasing IPC in a meaningful way, they would doubtless lead to much bigger cores a la Apple A series (though not to that extreme) and would require some sacrifices until they can be used on a smaller node.

jpiniero · Jun 21, 2020

Exist50 said:
Not always true. 10c Comet Lake S is not used for mobile

Comet Lake-H was pretty likely intended to go up to 10, but was cut back to 8, presumably over the power draw. Rocket Lake on mobile may end up being canned for the same reason.

coercitiv · Jun 21, 2020

IntelUser2000 said:
Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).

I don't buy this, not for performance in consumer desktops. The ratio between small and big cores doesn't bring that much of an advantage if you take into consideration the higher thread count needed to reach optimal throughput, not to mention the fact that it likely requires tasks with low MT diminishing returns to get there.

Did some napkin math last night while reading this thread, so I'll just copy-paste it bellow. If we assume GC = 1.5x Skylake IPC and Gracemont = 1x Skylake IPC, SMT yields at 20%, let's compare throughput potential and topology constraints:

Code:

8 big + 8 small (1x area)
8 x 1.5 x 1.2 = 14.4
8 x 1 = 8
Throughput @ 24T = 22.4
Throughput @ 16T = 20
Throughput @ 12T = 16
Will require dual ring bus, some kind of mesh or new type of interconnect.
Latency sensitive tasks will probably run only on the big cluster for best results.

10 big (1x area)
10 x 1.5 x 1.2 = 18
Throughput @ 24T ~ 18
Throughput @ 16T = 16.8
Throughput @ 12T = 15.6
Same old ring bus, all cores readily available for everything.

12 big (1.2X area)
12 x 1.5 x 1.2 = 21.6
Throughput @ 24T = 21.6
Throughput @ 16T = 19.2
Throughput @ 12T = 18
Maybe too much of a stretch for ring bus, maybe still doable.

8 big + 16 small (1.2X area)
Throughput @ 32T = 30.4
Throughput @ 24T = 28
Throughput @ 16T = 20
Throughput @ 12T = 16

Some observations:

12T workloads would work just as well on 10 big as on 8+8
8+8 will likely use only the big cores in gaming, pure 8 big core chips will be smaller and just as fast
12 big can match 8+8 in throughput, incidentally this may look a lot like Alder Lake vs. Zen 4
8+16 really starts to shine in MT, but is 32T a consumer load anymore?

I couldn't cover the influence of power savings brought by the small cores, but then again we'd have to take other things into consideration as well:

small cores may or may not reach big core frequency, meaning the math is purely about max potential anyway
significant changes in interconnect may actually offset power gains brought by a relatively small cluster of 8 small cores
it's power on enthusiast desktop, we're playing with 150-200W right not and don't seem to mind... so why start caring now?

From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put everything together in a cohesive package. Both Lakefield and Alder Lake successors will probably be the real deal where design decisions & prior experience bring hefty performance results, but it seems that lately all we do with Intel is dream about the generation after the next. Luckily both TGL and RKL-S are far more conventional and ready for today's software.

jpiniero · Jun 21, 2020

Mobile would definately benefit from this setup though, and that's what matters.

IntelUser2000 · Jun 21, 2020

coercitiv said:
From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put them together into a cohesive package.

I agree with what you are saying. Right now with the 8+8 configuration it makes the configuration questionable for very parallel workloads.

By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.

I know they are not stupid, and know what they need to be competitive. Something like a dual chiplet with each having 8+8 would do it, but its just speculation to explain away leaks at this point.

Antey · Jun 21, 2020

is windows scheduler optimized for heterogeneous multi-processing? i would really like a cpu that use small cores to do tasks with low priority and only use the big cores when is needed to. and then use all cores for heavy mutlthread applications. we could see much quieter pcs, lower temps, much lower power consumptions. i don't need zen/core cores for web browsing, do i? 4-8 small cores like Jaguar or Atom for web browsing, and some help from the big cores if needed, that would be cool...

Zucker2k · Jun 21, 2020

coercitiv said:
8+16 really starts to shine in MT, but is 32T a consumer load anymore?

No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.

Markfw · Jun 21, 2020

Zucker2k said:
No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.

Their HEDT lineup goes to 64 cores, threadripper is HEDT.

coercitiv · Jun 21, 2020

IntelUser2000 said:
By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.

Interesting, that explains a lot.

Zucker2k · Jun 21, 2020

Markfw said:
Their HEDT lineup goes to 64 cores, threadripper is HEDT.

Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.

Hitman928 · Jun 21, 2020

Zucker2k said:
Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.

The TR chips that didn't have more cores than the AM4 CPUs were terrible sellers, that's why.

IntelUser2000 · Jun 21, 2020

According to this site, the 1065G7 only runs at 3.75GHz and thus is actually a few % faster perf/clock than AT test indicates:

1065G7 25W下跑SPEC2017 IPC（Sunny cove）对比SKL/Zen2-CPU-chiphell-非常论坛

一直对Sunny cove的IPC情况很困惑，苦于没有综合测试数据的频率情况，没有做具体的统计计算。感谢anand的安德鲁的部分测试数据自己找机器测了一下1065G7（25W）跑SPEC2017的频率情况

machbbs.com

I'm seeing a 20% per clock advantage for 1065G7 versus 9900K in SpecInt. AT's test showed only 14%. It's 13% over 3950X. Comparison of scores indicate the 1065G7 isn't from AT, but the 9900K/3950X results are.

eek2121 · Jun 21, 2020

Where are you folks seeing a 10nm chip hit 4.8GHz? 11th gen is supposedly 14nm...

IntelUser2000 · Jun 21, 2020

eek2121 said:
Where are you folks seeing a 10nm chip hit 4.8GHz? 11th gen is supposedly 14nm...

The 11th gen you are talking about is Rocketlake, which is for desktops.

Tigerlake is 11th gen for laptops. Here's a leak for 1165G7

https://twitter.com/x/status/1270322662968045569

I don't think the boost will be that high, but there's an even higher end version called the 1185G7. 1185G7 is similar to 7600U/8650U/8665U in that its a less common bleeding edge top tier part and 1165G7 is a regular high end one. Usually when you go configure your laptop something like the 1185G7 will cost you $100 extra.

aigomorla · Jun 22, 2020

sigh... we need another reboot in the number schemes....
11xxx is getting a bit ridiculous.

I guess we probably wont get one until the next real arch change.

tamz_msc · Jun 22, 2020

IntelUser2000 said:
According to this site, the 1065G7 only runs at 3.75GHz and thus is actually a few % faster perf/clock than AT test indicates

I'm pretty sure AT had theirs run at 3.9GHz.

IntelUser2000 · Jun 22, 2020

tamz_msc said:
I'm pretty sure AT had theirs run at 3.9GHz.

That may be, but its also an SDP system.

IntelUser2000 · Jun 22, 2020

tamz_msc said:
I'm pretty sure AT had theirs run at 3.9GHz.

The AT results are pretty much identical to that one.

Discussion Intel current and future Lakes & Rapids thread

Lifer

Lifer

Lifer

Lifer

Lifer

Platinum Member

Elite Member

Diamond Member

Lifer

Diamond Member

Lifer

Elite Member

Member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Diamond Member

Elite Member

Platinum Member

Elite Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Diamond Member

Elite Member

Elite Member