Discussion Intel current and future Lakes & Rapids thread

Page 96 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
ICL 2.5-3x perf boost over WHL. (15w vs 15w)
Is that overall performance or just machine learning inference and neural net performance? If it's just AI performance then we should treat it as such, considering hardware support for AVX-512 VNNI in Icelake.

I'm also a bit shocked by the Tigerlake mirror games:
  • TGL 4+2 @ 9W on 7nm 10nm providing 2X productivity over AML 2+2 @ 5W on 14nm is not impressive at all --> you double the core count, essentially double the power budget for those cores and also have considerable power savings from 7nm 10nm to work with.
  • TGL 25W offering 4X GPU perfomance over WHL 15W should also raise some eyebrows --> if we consider 5W CPU core usage during graphics workloads then we're already in a scenario where GPU power budget is being doubled even before 7nm 10nm power savings are factored in. In fact, if we look at the 2X improvement of ICL Gen 11 vs. WHL Gen 9.5, all we're left with is TGL 25W Xe offering 2X performance over ICL 15W Gen11. Cut that power in half and Xe advantage may come down to low double digit numbers.
Am I missing something?
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
  • TGL 25W offering 4X GPU perfomance over WHL 15W should also raise some eyebrows --> if we consider 5W CPU core usage during graphics workloads then we're already in a scenario where GPU power budget is being doubled even before 7nm power savings are factored in. In fact, if we look at the 2X improvement of ICL Gen 11 vs. WHL Gen 9.5, all we're left with is TGL 25W Xe offering 2X performance over ICL 15W Gen11. Cut that power in half and Xe advantage may come down to low double digit numbers.
Am I missing something?
TGL is 10nm. Icelake's 2.5-3x refers to AI performance. Actually that part is on Intel's press page.

25W doesn't bring a lot of advantage over 15W. Likely because 25W is running at voltage and frequencies for the GPU that are not optimal for efficiency.
https://www.notebookcheck.net/Apple...ouch-Bar-Review.227154.0.html#toc-performance

15-20% going from 15W with eDRAM to 28W with eDRAM. The rest would be architectural changes. If they can do 2x over Icelake even with 25W vs 15W it would actually be quite impressive.

The CPU testing also likely has some caveats and until we know the details we can't conclusively say anything.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
TGL is 10nm.
Yeah, I meant 10nm but hadn't had coffee yet.

25W doesn't bring a lot of advantage over 15W. Likely because 25W is running at voltage and frequencies for the GPU that are not optimal for efficiency.
https://www.notebookcheck.net/Apple...ouch-Bar-Review.227154.0.html#toc-performance

15-20% going from 15W with eDRAM to 28W with eDRAM. The rest would be architectural changes. If they can do 2x over Icelake even with 25W vs 15W it would actually be quite impressive.
I don't buy this. Keep in mind we're talking about different gen GPUs, which presumably increase GPU transistor count, not (just) clocks. ICL Gen 11 already sees a massive increase in GPU resources, I think we already talked about how that may also translate into a drop in clocks to get some sweet efficiency gains from there as well.

It's a ~100% increase in power budget, why make projections that effectively demolish the efficiency of your most anticipated GPU arch?! Why not keep it as clear-cut as the anticipated 2X jump for ICL Gen11 15W?
 
Last edited:

JasonLD

Senior member
Aug 22, 2017
485
445
136
Because they can't actually yield the 10 nm products. Cooper Lake for instance is what will be actually bought versus Icelake Server; and for client Comet Lake and Rocket Lake versus Icelake Client and Tigerlake.


Or they could cover < 28 cores with Cooper Lake and 32+ with Ice Lake. I don't think they will do another temporary band-aid solution like Cascade-Lake AP with Cooper Lake.
Since they actually started sampling Ice Lake Server parts, we probably get some leaks sooner than later...so we will find out soon.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
Is that overall performance or just machine learning inference and neural net performance? If it's just AI performance then we should treat it as such, considering hardware support for AVX-512 VNNI in Icelake.

I'm also a bit shocked by the Tigerlake mirror games:
  • TGL 4+2 @ 9W on 7nm 10nm providing 2X productivity over AML 2+2 @ 5W on 14nm is not impressive at all --> you double the core count, essentially double the power budget for those cores and also have considerable power savings from 7nm 10nm to work with.
  • TGL 25W offering 4X GPU perfomance over WHL 15W should also raise some eyebrows --> if we consider 5W CPU core usage during graphics workloads then we're already in a scenario where GPU power budget is being doubled even before 7nm 10nm power savings are factored in. In fact, if we look at the 2X improvement of ICL Gen 11 vs. WHL Gen 9.5, all we're left with is TGL 25W Xe offering 2X performance over ICL 15W Gen11. Cut that power in half and Xe advantage may come down to low double digit numbers.
Am I missing something?
Oops sorry meant to say AI Perf
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I don't buy this. Keep in mind we're talking about different gen GPUs, which presumably increase GPU transistor count, not (just) clocks. ICL Gen 11 already sees a massive increase in GPU resources, I think we already talked about how that may also translate into a drop in clocks to get some sweet efficiency gains from there as well.

I don't think Gen 11 will require reductions in clocks to achieve this. The die shot pics on the last page shows despite having enormously increased resources, the size is quite compact. Pre-Intel Gen 9 GPUs supported the ability to flexibly switch between AoS and SoA addressing modes. Gen 9 moves to being the latter only like AMD/Nvidia which resulted in EU size being reduced by 20-25% using the same process.

They are essentially several years behind and playing catch up especially in the graphics space due to the 10nm delay. Further efficiencies can be extracted, especially in the low level details that we might not get to hear.

They'll have to clock it high enough to get 2x just because things won't scale linearly with number of execution resources.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
I don't think Gen 11 will require reductions in clocks to achieve this. The die shot pics on the last page shows despite having enormously increased resources, the size is quite compact.
Take a look at KBL-R vs. ICL die shop comparison, including an area overlap of CPU over GPU to asses relative change in transistor count:

kblr-vs-icl.jpg

Now take into consideration the density gain from the node jump and the relative gain in area compared with the 4 ICL cores, togheter with whatever increase in transistor count ICL brings over KBL-R anyway.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I did that already. The proportion is roughly 1:1 in Skylake. It goes to 1.3:1 in Icelake. Even considering 25% iso-process reduction, the GPU should have been a lot bigger. You can see how large the GT4e GPU is: https://en.wikichip.org/wiki/File:4_core_hp_gt4_skylake.svg

The high level resources of Gen 11 and Skylake GT4e GPU are roughly the same. GT4e in Skylake, if you compare to 4 cores, the ratio is 2.26:1.

Also from the SoC perspective the GPU takes about the same area as Gen 9 GT2.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I did that already. The proportion is roughly 1:1 in Skylake. It goes to 1.2-1.3:1 in Icelake. Even considering 25% iso-process reduction, the GPU should have been a lot bigger. You can see how large the GT3e GPU was in Broadwell: https://software.intel.com/en-us/articles/driver-support-matrix-for-media-sdk-and-opencl

You forgot that the CPUs are significantly increasing in gate-count as well. So going from 1:1 to 1.3:1 means that the GPU gate-count increasing 30% on top of what the relative gate-count increase of the CPUs are.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
You forgot that the CPUs are significantly increasing in gate-count as well. So going from 1:1 to 1.3:1 means that the GPU gate-count increasing 30% on top of what the relative gate-count increase of the CPUs are.

Yes, but it'll be little in contrast to how much the GPU shrank. I don't think the GPU transistor count went down significantly. I think in addition to the architectural efficiency, it took near full advantage of the 2.7x scaling. Since they were often criticized for lack of 3D load power efficiency and large die area for its performance, its a welcome improvement.

CPUs as usual, would end up in the 2x range. CPU portion of Icelake probably got about 30% larger.
 

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
You really love making big assumptions without any information to back it up

What other rational explanation is there for Intel's behaviour?

Launching a server platform takes a lot of resources- not just from Intel, but from their OEM partners (like HP and Dell). A server system needs to go through lots and lots of testing and verification to guarantee that it can meet the stability standards required by the customers (on top of the costs of designing the motherboards and the chassis). Why the hell would Intel ask their partners to go through all of that twice, if they knew that Ice Lake was really ready and would make Cooper Lake redundant? The only rational explanation is that Ice Lake is in some way a dud. The options I see are

a) there's something wrong with performance/power consumption, that makes it worse than Cooper Lake
b) the silicon is in some way bugged (think Transactional Memory not working right in early models), meaning that Cooper Lake is needed for certain specific use cases where Ice Lake literally doesn't work
c) Ice Lake just isn't going to be available in sufficient quantities to satisfy demand, and Cooper Lake will handle the rest

I think C is the most likely. Ice Lake has been on the back burner for a looong time, I'm sure that they would have ironed out any bugs over that time. And I don't think Intel is crazy enough to launch a 10nm part that is outperformed by a 14nm part- it would be basically declaring that they can't go past 14nm. They would surely scrap the part altogether rather than admit that.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I'll add further to coercitiv's comment about needing lower clocks to achieve high efficiency for Gen 11.

Consider the following:
-Voltage scaling is pretty much dead. Nvidia/AMD runs its GPU at 1.1V. You can do it, maybe for small gains.
-Y-parts can run at 500MHz in 3D load. U parts can run at 1GHz.
-2x performance in 3D requires greater than 2x in resources, because it doesn't scale linearly. The Iris Pro 580 requires 3x the TDP, EUs, fillrate, and eDRAM along with H-class CPU to get little over 2x gain over a 15W GT2 chip.
-2x gain is also stated for Lakefield under 5W and 7W TDP compared to Amber Lake.

That means with Gen 11, you have the lower bound of performance at slightly lower than Iris Plus 640 performance with Lakefield, and with 15W parts, mobile raven ridge Vega 8 to Vega 10 range of performance with it occasionally touching GT4e SKL, beating previous gen Iris Plus parts by 50%.

I think maybe it could be 5-10% lower in clocks at load, but that doesn't change the overall picture. 2x gain over Icelake is substantial. The more important question is whether its an upto figure needing a lower volume SKU(maybe with on-package HBM), or a GT2 one. Same thing was said with Haswell, quoting GT3e figures when telling us the GPU will be twice as fast.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I think C is the most likely.

What about,

d) Ice Lake SP is available for lower end SKUs using dual die MCM to get up to 32 cores while Cooper Lake covers the high end?

This is the only way not to cannibalize sales. It also fits using multiple smaller dies for ease of production early in the cycle, and also fits leaked U-chips being clocked low.
 
  • Like
Reactions: exquisitechar

Thala

Golden Member
Nov 12, 2014
1,355
653
136
-2x performance in 3D requires greater than 2x in resources, because it doesn't scale linearly. The Iris Pro 580 requires 3x the TDP, EUs, fillrate, and eDRAM along with H-class CPU to get little over 2x gain over a 15W GT2 chip..

I do assume that 2xperformance gain comes from the fact that #EUs where increased from 24 to 64 - which is a factor of 2.66 - which could be sufficient for 2xperformance at iso frequency.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
I did that already. The proportion is roughly 1:1 in Skylake. It goes to 1.3:1 in Icelake. Even considering 25% iso-process reduction, the GPU should have been a lot bigger.
A 25% iso-process reduction due to leaner cores coupled with a 30% increase in total GPU area leads to a considerable 70% increase even before node scaling is taken into account. Sure performance scaling won't be linear with the raw jump in execution resources (24 --> 64) , but the thing is massive when everything adds up.

I think maybe it could be 5-10% lower in clocks at load, but that doesn't change the overall picture. 2x gain over Icelake is substantial. The more important question is whether its an upto figure needing a lower volume SKU(maybe with on-package HBM), or a GT2 one. Same thing was said with Haswell, quoting GT3e figures when telling us the GPU will be twice as fast.
(Very) friendly reminder my issue was with TGL Xe projection arbitrarily using 25W TDP, not ICL Gen 11. I specifically said the ICL Gen11 gain claim is clear-cut, if they deliver it's all good on the Gen 11 front.
 

beginner99

Diamond Member
Jun 2, 2009
5,208
1,580
136
Let me put it this way, you kind to have to read the tea leaves. That they are fully refreshing products with 14 nm in 2020 is pretty solid evidence of what the plan is. And the plan is to sell 14 nm until they can get 7 nm out; and use 10 nm as a distraction to Wall Street.

Another good hint is intel increasing 14nm capacity and as they said last year investing 1 billion in it. Makes sense they need that because the dies are getting bigger both client and server. They would not invest that much if they would soon move to 10nm (or 7nm). Hence niche products on 10nm and then launch of 7nm in 2021.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Because they can't actually yield the 10 nm products. Cooper Lake for instance is what will be actually bought versus Icelake Server; and for client Comet Lake and Rocket Lake versus Icelake Client and Tigerlake.

They CAN yield 2c/4t Icelake U and Y. I mean, right, Rocket Lake; Comet Lake; and Cooper Lake make things look bad for them in the desktop/workstation/server market since they wouldn't need any of those things if a full raft of IceLake products were ready for every market segment. But what if . . . eh see below.

Lakefield is in the roadmap as M-series. Lakefield could be thought as a proper Core M. Since a new CPU architecture from inception to being on shelves take 4-5 years, it makes sense.

That is a fascinating development, and it's one worth watching. Foveros, in action, in 2019. Probably the most interesting product Intel has announced for this year, not in terms of what it can do individually, but what it means for the capabilities of Intel's advanced packaging technologies.

Or they could cover < 28 cores with Cooper Lake and 32+ with Ice Lake. I don't think they will do another temporary band-aid solution like Cascade-Lake AP with Cooper Lake.

Seems unlikely unless they do something exotic to get IceLake-SP out the door (see below).

Since they actually started sampling Ice Lake Server parts, we probably get some leaks sooner than later...so we will find out soon.

I certainly hope so. Just remember that in the ODM era of cloud computing, early customers get first crack at (sometimes custom) silicon we won't see for 6 months or longer. Amazon and Google had Skylake-SP before anyone else had seen it, and I do not think there was a single credible leak about it until much later.

c) Ice Lake just isn't going to be available in sufficient quantities to satisfy demand, and Cooper Lake will handle the rest

That's possible, but when you consider what conditions are forcing Intel to stop at 2c parts for the IceLake products that we know about, the Tiger Lake products that they discussed yesterday, and Cannonlake, it stands to reason that bigger-than-2c IceLake yields might be somewhere in the order of 0%. Or close enough to 0% to make it a complete waste of time. Are you going to spend an entire wafer to get one 32c IceLake-SP die? Under those circumstances I don't see IceLake-SP satisfying anyone.

What about,

d) Ice Lake SP is available for lower end SKUs using dual die MCM to get up to 32 cores while Cooper Lake covers the high end?

Or what if Intel is going to try to string together a bunch of 2c IceLake dice via EMIB or gasp Foveros and do it that way? Their mesh topology might allow such a monstrosity to work without adding too much to intercore latency. Maybe. For those who can't tolerate the obvious latency problems that would probably result from such a doohickey, there would be Cooper Lake as a backup. And I have a feeling that the yields on the 2c dice are bad enough that Intel won't be able to make too many such beasts anyway (see @NTMBK 's observation above).

Hence niche products on 10nm and then launch of 7nm in 2021.

The best part of this entire presentation is that Intel seems dead-set serious about pushing 7nm out the door in 2021. Personally I don't see them being able to bring enough IceLake products to market for 10nm to make a dent in what will continue to be a 14nm-dominated product lineup. Intel needs this, and in a way, so does the computing world as a whole.

Without it, it's TSMC vs. Samsung. One less fab on the cutting edge.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
A 25% iso-process reduction due to leaner cores coupled with a 30% increase in total GPU area leads to a considerable 70% increase even before node scaling is taken into account.

The 25% reduction was stated for the EUs. That's mostly due to reducing DP FP from 1/4 to zero.

In Gen 9, EUs take 25% of the GPU size. In Gen 11, it drops to 19%. Hmm, 25% to 19% seems almost exactly like if you decrease the 25% portion by 25%. If that's the only thing done, then you'd end up with the whole GPU being 6% smaller. On the other hand, it does have 8 less EUs, but also adds in a bunch of features. The saving might end up being a wash.

The EU to die ratio decreased a LOT compared to Broadwell: https://www.extremetech.com/wp-content/uploads/2015/06/5thCore.png

I would now incline further towards the GPU meeting its 2.7x density goal while the GPU remained at a historical 2x level. Same thing happened in 14nm when Atom core got 2.6x or so density improvement.

I do assume that 2xperformance gain comes from the fact that #EUs where increased from 24 to 64 - which is a factor of 2.66 - which could be sufficient for 2xperformance at iso frequency.

That's a good point. Though just boosting EUs by 2.66x wouldn't have been enough. The resulting boost in texture samplers, and improved low-level work was likely required.

Years ago when I was extensively analyzing benchmarks, the gain for doubling resources in GPUs seemed to be taken up roughly 1/3rd for each core criteria, 1/3rd for fillrate, 1/3rd for shader throughput, and 1/3rd for memory bandwidth. So doubling memory bandwidth resulted in 30% improvement.

Anything that went significantly out of that range were likely due to the specific GPU being less balanced. If you get 8% gain for 10% more BW, then its very, very memory bound.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
  • TGL 4+2 @ 9W on 7nm 10nm providing 2X productivity over AML 2+2 @ 5W on 14nm is not impressive at all --> you double the core count, essentially double the power budget for those cores and also have considerable power savings from 7nm 10nm to work with.
View attachment 6108
According to configuration disclosure the Tigerlake setup is 4+2 and 96EU. Also 15w vs 15w

2x gain in Sysmark 2014 total score is actually quite amazing. The benchmark doesn't scale well with extra cores. The last time they claimed was with Whiskeylake, but they were using a 4 core 8 thread 8th Gen core compared to a 4th Gen dual core 4 thread Haswell!

Another showing Sysmark 2014 scaling.
https://pcper.com/2017/10/the-coffee-lake-story-intel-core-i7-8700k-and-core-i5-8400-review/2/

Anand's Bench shows 30% increase in Sysmark 2014 going from Celeron J1800 to J1900.

Intel also says 30% from 7th Gen Core 15W(dual core), to 8th Gen core 15W(quad core): https://www.tomshardware.com/news/intel-eighth-generation-core-i7-coffee-lake,34578.html

I wonder whether its a typo for the GPU, or there's a minimal difference between the 15W TGL and 25W TGL part.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
2x gain in Sysmark 2014 total score is actually quite amazing.
Intel also says 30% from 7th Gen Core 15W(dual core), to 8th Gen core 15W(quad core)
That's 30% from doubling the core count and intra-node improvements. (14nm+ -> 14nm++)

Now double the TDP and also include a real node jump. It only takes 20% scaling with core count, 30% from increased TDP and 30% from the full node jump to reach 100% improvement in performance. The gains are multiplicative.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
That's 30% from doubling the core count and intra-node improvements. (14nm+ -> 14nm++)

Now double the TDP and also include a real node jump. It only takes 20% scaling with core count, 30% from increased TDP and 30% from the full node jump to reach 100% improvement in performance. The gains are multiplicative.

I'll repeat, the last time they got 2x was going from i5 4200U to a i5 8265U. The multi-threaded performance of the latter chip is 3x as fast, and the single thread performance is 50-60% better. That also required going from the 545S SSD to 760p SSD, and it impacts Sysmark by few %.

Amber Lake Y isn't that bad for ST: https://www.notebookcheck.net/HP-Spectre-Folio-13-i5-8500Y-Convertible-Review.365941.0.html

DrMrLordX said:
That is a fascinating development, and it's one worth watching. Foveros, in action, in 2019.

It's not Foveros I'm excited for in Lakefield.

By the way, I think the "customer" Intel mentioned is Microsoft. There were rumors and leaks of Microsoft wanting to make a very portable device. Notes for Lakefield has a mention of an 8" screen. Interesting that its so small.

TheGiant said:
so I should be able to buy a dell xps 13 christmas 2019 with 4C/8T Icelake? Real or fake?

Real. They showed it at an Intel conference back in January. For Dell it could be the 2-in-1 though based on an old roadmap. There's a chance some manufacturers might announce Icelake at Computex. Some might be available on shelves by Back to School time period.
 
Last edited: