Intel Cannonlake, Ice Lake, Tiger Lake & Sapphire rapid thread

Page 235 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Member
Aug 18, 2016
83
94
61
That's how it's been... S has been H but in socketed form. Except H is gone with Alder Lake and replaced with P.
Not always true. 10c Comet Lake S is not used for mobile, nor have we heard rumors about Rocket Lake S. It's quite plausible that the 8+8 Alder Lake S die is only used for desktop. If the P rumor is correct (15-45W TDP), then 8+8 on 10nm seems too power hungry.
 

IntelUser2000

Elite Member
Oct 14, 2003
6,825
1,387
136
So help me to understand. What's the point of using the big+little approach on desktops? I can see why mobile devices use it due to greater power efficiency and what not, but not for desktops. I suppose they don't think they can put 16 big cores on a single die perhaps?
No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

1592703903819.png

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
5,880
565
126
No, that's still true of desktops. If it wasn't, we'd have seen engineers make it as wide as it can be to their hearts content. That's why we moved to multiple cores.

Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).
That's a great point, and makes sense! I never thought of it like that. One of the biggest reasons why the Apple A series CPUs are lauded from a performance standpoint is because of how wide and big they are, but putting a large amount of those cores on a single die would probably not be possible without some significant reengineering of the CPUs to make them more scalable. So if Intel is now chasing IPC in a meaningful way, they would doubtless lead to much bigger cores a la Apple A series (though not to that extreme) and would require some sacrifices until they can be used on a smaller node.
 

jpiniero

Diamond Member
Oct 1, 2010
7,789
1,114
126
Not always true. 10c Comet Lake S is not used for mobile
Comet Lake-H was pretty likely intended to go up to 10, but was cut back to 8, presumably over the power draw. Rocket Lake on mobile may end up being canned for the same reason.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
3,725
3,500
136
Third image. Allows big core to be even bigger than it could be* for highest ST performance while the many small cores are for not sacrificing MT performance.

It's a sort of task specialization but for the multi-core era.

*Let's say w/o hybrid GC is 20% faster than WC, but with it rather than having 16 of those cores, you would have GC cores that are 40% faster than WC but only 8 of them, and 8 Gracemont that adds to MT performance(and each outperforming SKL by 5% or so).
I don't buy this, not for performance in consumer desktops. The ratio between small and big cores doesn't bring that much of an advantage if you take into consideration the higher thread count needed to reach optimal throughput, not to mention the fact that it likely requires tasks with low MT diminishing returns to get there.

Did some napkin math last night while reading this thread, so I'll just copy-paste it bellow. If we assume GC = 1.5x Skylake IPC and Gracemont = 1x Skylake IPC, SMT yields at 20%, let's compare throughput potential and topology constraints:

Code:
8 big + 8 small (1x area)
8 x 1.5 x 1.2 = 14.4
8 x 1 = 8
Throughput @ 24T = 22.4
Throughput @ 16T = 20
Throughput @ 12T = 16
Will require dual ring bus, some kind of mesh or new type of interconnect.
Latency sensitive tasks will probably run only on the big cluster for best results.

10 big (1x area)
10 x 1.5 x 1.2 = 18
Throughput @ 24T ~ 18
Throughput @ 16T = 16.8
Throughput @ 12T = 15.6
Same old ring bus, all cores readily available for everything.

12 big (1.2X area)
12 x 1.5 x 1.2 = 21.6
Throughput @ 24T = 21.6
Throughput @ 16T = 19.2
Throughput @ 12T = 18
Maybe too much of a stretch for ring bus, maybe still doable.

8 big + 16 small (1.2X area)
Throughput @ 32T = 30.4
Throughput @ 24T = 28
Throughput @ 16T = 20
Throughput @ 12T = 16
Some observations:
  • 12T workloads would work just as well on 10 big as on 8+8
  • 8+8 will likely use only the big cores in gaming, pure 8 big core chips will be smaller and just as fast
  • 12 big can match 8+8 in throughput, incidentally this may look a lot like Alder Lake vs. Zen 4
  • 8+16 really starts to shine in MT, but is 32T a consumer load anymore?
I couldn't cover the influence of power savings brought by the small cores, but then again we'd have to take other things into consideration as well:
  • small cores may or may not reach big core frequency, meaning the math is purely about max potential anyway
  • significant changes in interconnect may actually offset power gains brought by a relatively small cluster of 8 small cores
  • it's power on enthusiast desktop, we're playing with 150-200W right not and don't seem to mind... so why start caring now?
From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put everything together in a cohesive package. Both Lakefield and Alder Lake successors will probably be the real deal where design decisions & prior experience bring hefty performance results, but it seems that lately all we do with Intel is dream about the generation after the next. Luckily both TGL and RKL-S are far more conventional and ready for today's software.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
6,825
1,387
136
From my POV this 8+8 Alder Lake, if true, is the same type of experiment as Lakefield: very promising when looking at isolated parts, but quite troublesome to optimize once you put them together into a cohesive package.
I agree with what you are saying. Right now with the 8+8 configuration it makes the configuration questionable for very parallel workloads.

By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.

I know they are not stupid, and know what they need to be competitive. Something like a dual chiplet with each having 8+8 would do it, but its just speculation to explain away leaks at this point.
 
  • Like
Reactions: coercitiv

Antey

Junior Member
Jul 4, 2019
22
22
41
is windows scheduler optimized for heterogeneous multi-processing? i would really like a cpu that use small cores to do tasks with low priority and only use the big cores when is needed to. and then use all cores for heavy mutlthread applications. we could see much quieter pcs, lower temps, much lower power consumptions. i don't need zen/core cores for web browsing, do i? 4-8 small cores like Jaguar or Atom for web browsing, and some help from the big cores if needed, that would be cool...
 

Zucker2k

Golden Member
Feb 15, 2006
1,117
492
136
8+16 really starts to shine in MT, but is 32T a consumer load anymore?
No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
20,242
7,887
136
No. I also think Intel is not going to rush to follow AMD and put 16 strong cores on client desktop. They'll prefer to counter a potential 16 core AMD future chip with a low priced HEDT chip. Only heaven knows why AMD chose to cut of their HEDT lineup at the 24 core mark.
Their HEDT lineup goes to 64 cores, threadripper is HEDT.
 

coercitiv

Diamond Member
Jan 24, 2014
3,725
3,500
136
By the way, the 8 Gracemont cores won't necessitate a dual ring bus. Tremont for example uses a quad-core cluster with L2 caches backing it so each quad core cluster only needs 1 ring stop. Essentially 8+8 is like a 10 core.
Interesting, that explains a lot.
 

Zucker2k

Golden Member
Feb 15, 2006
1,117
492
136
Their HEDT lineup goes to 64 cores, threadripper is HEDT.
Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.
 
  • Haha
Reactions: spursindonesia

Hitman928

Platinum Member
Apr 15, 2012
2,505
1,638
136
Yes, but the first generation TR bottomed out with the 1900x 8 core. Second generation bottomed out with TR 2920x 12 core. Following that logic, the Third generation TR should've at least bottomed out with the 16 core 3950x.
The TR chips that didn't have more cores than the AM4 CPUs were terrible sellers, that's why.
 
  • Love
Reactions: spursindonesia

IntelUser2000

Elite Member
Oct 14, 2003
6,825
1,387
136
According to this site, the 1065G7 only runs at 3.75GHz and thus is actually a few % faster perf/clock than AT test indicates:


I'm seeing a 20% per clock advantage for 1065G7 versus 9900K in SpecInt. AT's test showed only 14%. It's 13% over 3950X. Comparison of scores indicate the 1065G7 isn't from AT, but the 9900K/3950X results are.
 

IntelUser2000

Elite Member
Oct 14, 2003
6,825
1,387
136
Where are you folks seeing a 10nm chip hit 4.8GHz? 11th gen is supposedly 14nm...
The 11th gen you are talking about is Rocketlake, which is for desktops.

Tigerlake is 11th gen for laptops. Here's a leak for 1165G7


I don't think the boost will be that high, but there's an even higher end version called the 1185G7. 1185G7 is similar to 7600U/8650U/8665U in that its a less common bleeding edge top tier part and 1165G7 is a regular high end one. Usually when you go configure your laptop something like the 1185G7 will cost you $100 extra.
 

aigomorla

Cases and Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
18,043
850
126
sigh... we need another reboot in the number schemes....
11xxx is getting a bit ridiculous.

I guess we probably wont get one until the next real arch change.
 
  • Like
Reactions: coercitiv

Cardyak

Member
Sep 12, 2018
26
14
41
MLID dropped his Alderlake & Sapphire Rapids information. Mostly tallies with existing rumours but there is some interesting information in here:

AlderLake.JPG

SapphireRapids.JPG
 

uzzi38

Senior member
Oct 16, 2019
810
954
96
MLID dropped his Alderlake & Sapphire Rapids information. Mostly tallies with existing rumours but there is some interesting information in here:

View attachment 23916

View attachment 23917
I can see at least one mistake on SPR, so grain of salt, and one other thing I'm very sure is wrong. (The mistake I can see is a positive thing for SPR, that is to say what MLID is suggesting here undersells it... but that's the most I can say).

Also not 100% sure about PCIe Gen 5 on ADL.
 
Last edited:

ASK THE COMMUNITY