Discussion Intel current and future Lakes & Rapids thread

Page 235 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
14,584
5,206
136
So help me to understand. What's the point of using the big+little approach on desktops? I can see why mobile devices use it due to greater power efficiency and what not, but not for desktops. I suppose they don't think they can put 16 big cores on a single die perhaps?

Mainstream desktop has been derived from mobile, so whatever mobile gets, mainstream desktop does too.

Intel has abandoned HEDT for the time being and is barely supplying DIY as it is... not sure it would be worth the effort to do a dual CPU chiplet.
 

DrMrLordX

Lifer
Apr 27, 2000
21,617
10,826
136
@Carfax83

No idea why Intel is using Gracemont cores in Alder Lake-S, but that's the word on the street. It'll be awhile before actual product appears on the market - probably Q4 2021.
 
  • Like
Reactions: spursindonesia

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
That's how it's been... S has been H but in socketed form. Except H is gone with Alder Lake and replaced with P.

Not always true. 10c Comet Lake S is not used for mobile, nor have we heard rumors about Rocket Lake S. It's quite plausible that the 8+8 Alder Lake S die is only used for desktop. If the P rumor is correct (15-45W TDP), then 8+8 on 10nm seems too power hungry.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So help me to understand. What's the point of using the big+little approach on desktops? I can see why mobile devices use it due to greater power efficiency and what not, but not for desktops. I suppose they don't think they can put 16 big cores on a single die perhaps?

No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

1592703903819.png

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).

That's a great point, and makes sense! I never thought of it like that. One of the biggest reasons why the Apple A series CPUs are lauded from a performance standpoint is because of how wide and big they are, but putting a large amount of those cores on a single die would probably not be possible without some significant reengineering of the CPUs to make them more scalable. So if Intel is now chasing IPC in a meaningful way, they would doubtless lead to much bigger cores a la Apple A series (though not to that extreme) and would require some sacrifices until they can be used on a smaller node.
 

jpiniero

Lifer
Oct 1, 2010
14,584
5,206
136
Not always true. 10c Comet Lake S is not used for mobile

Comet Lake-H was pretty likely intended to go up to 10, but was cut back to 8, presumably over the power draw. Rocket Lake on mobile may end up being canned for the same reason.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,853
136
Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).
I don't buy this, not for performance in consumer desktops. The ratio between small and big cores doesn't bring that much of an advantage if you take into consideration the higher thread count needed to reach optimal throughput, not to mention the fact that it likely requires tasks with low MT diminishing returns to get there.

Did some napkin math last night while reading this thread, so I'll just copy-paste it bellow. If we assume GC = 1.5x Skylake IPC and Gracemont = 1x Skylake IPC, SMT yields at 20%, let's compare throughput potential and topology constraints:

Code:
8 big + 8 small (1x area)
8 x 1.5 x 1.2 = 14.4
8 x 1 = 8
Throughput @ 24T = 22.4
Throughput @ 16T = 20
Throughput @ 12T = 16
Will require dual ring bus, some kind of mesh or new type of interconnect.
Latency sensitive tasks will probably run only on the big cluster for best results.

10 big (1x area)
10 x 1.5 x 1.2 = 18
Throughput @ 24T ~ 18
Throughput @ 16T = 16.8
Throughput @ 12T = 15.6
Same old ring bus, all cores readily available for everything.

12 big (1.2X area)
12 x 1.5 x 1.2 = 21.6
Throughput @ 24T = 21.6
Throughput @ 16T = 19.2
Throughput @ 12T = 18
Maybe too much of a stretch for ring bus, maybe still doable.

8 big + 16 small (1.2X area)
Throughput @ 32T = 30.4
Throughput @ 24T = 28
Throughput @ 16T = 20
Throughput @ 12T = 16

Some observations:
  • 12T workloads would work just as well on 10 big as on 8+8
  • 8+8 will likely use only the big cores in gaming, pure 8 big core chips will be smaller and just as fast
  • 12 big can match 8+8 in throughput, incidentally this may look a lot like Alder Lake vs. Zen 4
  • 8+16 really starts to shine in MT, but is 32T a consumer load anymore?
I couldn't cover the influence of power savings brought by the small cores, but then again we'd have to take other things into consideration as well:
  • small cores may or may not reach big core frequency, meaning the math is purely about max potential anyway
  • significant changes in interconnect may actually offset power gains brought by a relatively small cluster of 8 small cores
  • it's power on enthusiast desktop, we're playing with 150-200W right not and don't seem to mind... so why start caring now?
From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put everything together in a cohesive package. Both Lakefield and Alder Lake successors will probably be the real deal where design decisions & prior experience bring hefty performance results, but it seems that lately all we do with Intel is dream about the generation after the next. Luckily both TGL and RKL-S are far more conventional and ready for today's software.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put them together into a cohesive package.

I agree with what you are saying. Right now with the 8+8 configuration it makes the configuration questionable for very parallel workloads.

By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.

I know they are not stupid, and know what they need to be competitive. Something like a dual chiplet with each having 8+8 would do it, but its just speculation to explain away leaks at this point.
 
  • Like
Reactions: coercitiv

Antey

Member
Jul 4, 2019
105
153
116
is windows scheduler optimized for heterogeneous multi-processing? i would really like a cpu that use small cores to do tasks with low priority and only use the big cores when is needed to. and then use all cores for heavy mutlthread applications. we could see much quieter pcs, lower temps, much lower power consumptions. i don't need zen/core cores for web browsing, do i? 4-8 small cores like Jaguar or Atom for web browsing, and some help from the big cores if needed, that would be cool...
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
8+16 really starts to shine in MT, but is 32T a consumer load anymore?
No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.
Their HEDT lineup goes to 64 cores, threadripper is HEDT.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,853
136
By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.
Interesting, that explains a lot.
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Their HEDT lineup goes to 64 cores, threadripper is HEDT.
Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.
 
  • Haha
Reactions: spursindonesia

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.

The TR chips that didn't have more cores than the AM4 CPUs were terrible sellers, that's why.
 
  • Love
Reactions: spursindonesia

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
According to this site, the 1065G7 only runs at 3.75GHz and thus is actually a few % faster perf/clock than AT test indicates:


I'm seeing a 20% per clock advantage for 1065G7 versus 9900K in SpecInt. AT's test showed only 14%. It's 13% over 3950X. Comparison of scores indicate the 1065G7 isn't from AT, but the 9900K/3950X results are.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Where are you folks seeing a 10nm chip hit 4.8GHz? 11th gen is supposedly 14nm...

The 11th gen you are talking about is Rocketlake, which is for desktops.

Tigerlake is 11th gen for laptops. Here's a leak for 1165G7


I don't think the boost will be that high, but there's an even higher end version called the 1185G7. 1185G7 is similar to 7600U/8650U/8665U in that its a less common bleeding edge top tier part and 1165G7 is a regular high end one. Usually when you go configure your laptop something like the 1185G7 will cost you $100 extra.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
20,841
3,189
126
sigh... we need another reboot in the number schemes....
11xxx is getting a bit ridiculous.

I guess we probably wont get one until the next real arch change.
 
  • Like
Reactions: coercitiv