Discussion Intel current and future Lakes & Rapids thread

jpiniero · Sep 5, 2020

I could see Alder Lake in Q3, but only if Rocket Lake is completely cancelled. Reason they would cancel it would be more due to the plummeting desktop TAM because of the virus than AMD.

CakeMonster said:
Rocket Lake is the new desktop chip, IIRC? This is getting really hard to follow. Is this the one that will add PCIE4 and have 8-16 cores competing with Zen 3 for better gaming performance?

PCIe 4 yes, only 8 cores max. Lower end is still Skylake.

mikk · Sep 5, 2020

They cannot cancel RKL-S imho, even for Intel only one generation supported on a new LGA platform is too low.

exquisitechar · Sep 5, 2020

https://zhuanlan.zhihu.com/p/214334613

Chinese TGL-U review.

jpiniero · Sep 5, 2020

exquisitechar said:
https://zhuanlan.zhihu.com/p/214334613

Chinese TGL-U review.

Interesting, it's a big improvement efficiency wise but still not close to Renoir in R15. Even the 6C6T 4500U is faster at the same power level.

exquisitechar · Sep 5, 2020

jpiniero said:
Interesting, it's a big improvement efficiency wise but still not close to Renoir in R15.

Still, great efficiency and iGPU improvement and fantastic ST performance. Biggest problem with TGL is that it's launching a bit late, IMO.

CakeMonster · Sep 5, 2020

jpiniero said:
I could see Alder Lake in Q3, but only if Rocket Lake is completely cancelled. Reason they would cancel it would be more due to the plummeting desktop TAM because of the virus than AMD.

PCIe 4 yes, only 8 cores max. Lower end is still Skylake.

Oh. That isn't too promising for those that are on 8 cores or more today and would want to upgrade to a better Intel CPU but would like to get at least another couple of cores.....

Spartak · Sep 5, 2020

eek2121 said:
Incorrect. TDP has nothing to do with power. A CPU can consume 11W of power and still have a TDP of 28W, or it can consume 50W of power and still have a 28W TDP. TDP stands for Thermal Design Power.

You are being facetuous here. TDP has nothing to to with power?

Maybe you should check the last word you bolded. Also look up the meaning of the other two words thermal and design.

The original meaning of Thermal Design Power was the maximum power output of the processor around which the system could be designed. It's irrelevant for your power supply and case design what the processor does at lower power output. The fact that Intel has been eroding and stretching and obfuscating the definition of TDP doesn't change what it's for: the maximum power output around which you design your system.

mikk · Sep 5, 2020

exquisitechar said:
https://zhuanlan.zhihu.com/p/214334613

Chinese TGL-U review.

Big graphics difference between DDR4 3200 and LPDR4x 4266 on the i5-1135G7???

https://twitter.com/x/status/1302276179819409408

It makes me wonder if they can boost the graphics further with LPDDR5-5400. The GPU runs at 11-14W in these 3dmark tests while running at its max boost of 1300 Mhz, it looks like there are some GPU clock speed improvements coming for ADL-P. For me Xe LP is much more impressive than Willow Cove.

Spartak · Sep 5, 2020

mikk said:
They cannot cancel RKL-S imho, even for Intel only one generation supported on a new LGA platform is too low.

I would take everything jpiniero says about RKL-S with some spoons of salt.

He denied for the longest possible time there was any kind of sunny/willowcove 14nm backport. Probably as recent as half a year ago it was yet another coffeelake iteration according to him. Since it became clear it was indeed sunny/willow cove based he's been claiming it will probably be cancelled.

Jimzz · Sep 5, 2020

mikk said:
Big graphics difference between DDR4 3200 and LPDR4x 4266 on the i5-1135G7???

https://twitter.com/x/status/1302276179819409408

It makes me wonder if they can boost the graphics further with LPDDR5-5400. The GPU runs at 11-14W in these 3dmark tests while running at its max boost of 1300 Mhz, it looks like there are some GPU clock speed improvements coming for ADL-P. For me Xe LP is much more impressive than Willow Cove.

Thats why Intel used a AMD system with 3200Mhz DDR4 and their systems used the LPDDR4 for their benchmarks.

Intel has the same issue as AMD, getting enough memory bandwidth for its higher end GPU. I think that is one of the reasons why AMD lowered down the GPU size of the 7nm chips as even with LPDDR4 they were still limited at the higher end.

jpiniero · Sep 5, 2020

Spartak said:
I would take everything jpiniero says about RKL-S with some spoons of salt.

He denied for the longest possible time there was any kind of sunny/willowcove 14nm backport. Probably as recent as half a year ago it was yet another coffeelake iteration according to him. Since it became clear it was indeed sunny/willow cove based he's been claiming it will probably be cancelled.

You're thinking of someone else. And it's not Sunny/Willow, it's some approximation of it.

IntelUser2000 · Sep 5, 2020

jpiniero said:
Interesting, it's a big improvement efficiency wise but still not close to Renoir in R15. Even the 6C6T 4500U is faster at the same power level.

All the CPUs in that review has its TDP jacked up to levels not used in target laptops.

The 4800U for example, will drop to 1200 points in R15 when the thermal headroom is exhausted. The 4700U won't drop a lot. Suggesting 8C/8T is fine at 25W, but 8/16 has to back down.

You can see behavior between CPUs change noticeably when set at lower TDP.

The GPU test doesn't seem to specify TDP which is weird.

Spartak said:
Intel has been eroding and stretching and obfuscating the definition of TDP doesn't change what it's for: the maximum power output around which you design your system.

Exactly.

In laptops though they have to play by the rules because its thermally limited. In desktops you just use a heavier/bigger heatsink.

Maybe most people saying TDP doesn't equal power consumption confuse it with battery life. Yea, battery life tests are all about light loads and it wasn't TDP since 20+ years ago. Of course when you load the system to full, it'll use power according to TDP.

Therefore,
TDP=Power consumption
TDP=Not always battery life

mikk said:
It makes me wonder if they can boost the graphics further with LPDDR5-5400. The GPU runs at 11-14W in these 3dmark tests while running at its max boost of 1300 Mhz,

It's weird its actually slower than 1065G7 using LPDDR4-3733. Maybe the game comparison is too limited.

If its that limited though, they can boost performance by 50% in Alderlake simply by increasing frequency to 1.5GHz and using LPDDR5-5400 memory. Maybe we'll get 70% again if they update occulusion culling algorithms.

Thats why Intel used a AMD system with 3200Mhz DDR4 and their systems used the LPDDR4 for their benchmarks.

The Vega isn't as limited. TDP makes a greater difference.

AMD Radeon Vega 8 vs 7 benchmarks and gaming results, vs Nvidia MX350 and MX250

We're reviewed a couple of AMD laptops in these last few weeks and I think it should be interesting to see how the Vega 7 and 8 graphics integrated with the

www.ultrabookreview.com

Asus ZenBook 14 UM433IQ review (AMD Ryzen 7 4700U, Nvidia MX350)

This here is the ZenBook 14 UM433IQ from Asus.

www.ultrabookreview.com

That's a 4900HS using DDR4-3200 versus 4700U using LPDDR4x-4266.

SAAA · Sep 5, 2020

Spartak said:
You are being facetuous here. TDP has nothing to to with power?

Maybe you should check the last word you bolded. Also look up the meaning of the other two words thermal and design.

The original meaning of Thermal Design Power was the maximum power output of the processor around which the system could be designed. It's irrelevant for your power supply and case design what the processor does at lower power output. The fact that Intel has been eroding and stretching and obfuscating the definition of TDP doesn't change what it's for: the maximum power output around which you design your system.

You could have a CPU running at 40W half of the time and 10W the other half, that would still fall under a 25W TDP as one running at 25W all the time.
The only difference in the cooling solutions would be temperature at any time, so when variable CPU is running at 40W it will increase and then cool down while at 10W.

Now heat capacity, so how long it takes to heat up to that temperature, this could be different for a cooling solution that tolerates 40W spikes against one that keeps 25W constant.

Practical example: a fan that blows air can dissipate X watts of heat, put it on a small heat-pipe and it will work very differently than over a large copper heat sink. Same cooling, different thermal capacity and spikes of power it could handle.

IntelUser2000 · Sep 5, 2020

@SAAA There's no difference between the two cooling systems. As a manufacturer heatsinks are ultimately designed for the long-term operation.

In fact, that's the very description of TDP. To eliminate any confusion I should clarify TDP means PL1 rating. It might use 40W for a min, or a few, but if its set at 25W TDP, it has to come back down after that, no matter if the heatsink is capable of handling 125W. The PL1 TDP rating is power.

As an example Surface Pro users complained that the CPU was power limited to 15W. It's a 15W CPU, so its limited to 15W. A heatsink doesn't care whether its using 25W constant or 25W in average over time.

eek2121 · Sep 5, 2020

SAAA said:
You could have a CPU running at 40W half of the time and 10W the other half, that would still fall under a 25W TDP as one running at 25W all the time.
The only difference in the cooling solutions would be temperature at any time, so when variable CPU is running at 40W it will increase and then cool down while at 10W.

Now heat capacity, so how long it takes to heat up to that temperature, this could be different for a cooling solution that tolerates 40W spikes against one that keeps 25W constant.

Practical example: a fan that blows air can dissipate X watts of heat, put it on a small heat-pipe and it will work very differently than over a large copper heat sink. Same cooling, different thermal capacity and spikes of power it could handle.

To add to this: https://en.m.wikipedia.org/wiki/Thermal_design_power

mikk · Sep 5, 2020

Jimzz said:
Thats why Intel used a AMD system with 3200Mhz DDR4 and their systems used the LPDDR4 for their benchmarks.

Intel has the same issue as AMD, getting enough memory bandwidth for its higher end GPU. I think that is one of the reasons why AMD lowered down the GPU size of the 7nm chips as even with LPDDR4 they were still limited at the higher end.

Renoir doesn't gain a lot from LPDDR4 and the 4800U is very hard to buy, with LPDDR4 even harder. 4800U with LPDDR4 is more a theoretical test. Keep in mind Xe LP is stronger than Vega 7/8 which could result in a bigger bandwidth bottleneck, even though we need more tests. I'm more interested in a i7-1165G7 vs 4700U comparison.

IntelUser2000 said:
It's weird its actually slower than 1065G7 using LPDDR4-3733. Maybe the game comparison is too limited.

If its that limited though, they can boost performance by 50% in Alderlake simply by increasing frequency to 1.5GHz and using LPDDR5-5400 memory. Maybe we'll get 70% again if they update occulusion culling algorithms.

This is what I'm expecting. 1500 Mhz GPU boost with DDR5/LPDDR5 out of the box, they might gain another 50% with the same architecture/EU count.

IntelUser2000 · Sep 5, 2020

mikk said:
This is what I'm expecting. 1500 Mhz GPU boost with DDR5/LPDDR5 out of the box, they might gain another 50% with the same architecture/EU count.

It's getting 40% gain from 33% more bandwidth.

In fact its so bad, that I think there's something else going on. Maybe at 3200MT, the RAM isn't at sync with the fabric and bandwidth drops and latency increases.

In Firestrike, both are showing nearly 50%. In heaven its showing 2x! Does it make sense that its merely due to 33% more bandwidth? Same thing in games.

Maybe more is going on, such as the CPU is being starved as well.

mikk · Sep 5, 2020

Yes there must be something wrong, the Firestrike score is way too low. Valley 1.0 runs twice as fast with LPDDR4, it doesn't make sense unless it's singlechannel DDR4.

IntelUser2000 · Sep 5, 2020

mikk said:
Yes there must be something wrong, the Firestrike score is way too low. Valley 1.0 runs twice as fast with LPDDR4, it doesn't make sense unless it's singlechannel DDR4.

Maybe not single channel. I've seen such anomalies before.

I'd trust nothing less than a Notebookcheck review from a shipping system. Eventually we'll get the full picture when multiple systems of varying configs are tested and other sites get it too.

naukkis · Sep 6, 2020

IntelUser2000 said:
It's getting 40% gain from 33% more bandwidth.

In fact its so bad, that I think there's something else going on. Maybe at 3200MT, the RAM isn't at sync with the fabric and bandwidth drops and latency increases.

In Firestrike, both are showing nearly 50%. In heaven its showing 2x! Does it make sense that its merely due to 33% more bandwidth? Same thing in games.

LPDDR4 isn't only more bandwith, it's also double the channels 4x vs 2x for regular DDR4. More channels boost effective bandwith through lower latencies under high bandwith demand.

coercitiv · Sep 6, 2020

SAAA said:
You could have a CPU running at 40W half of the time and 10W the other half, that would still fall under a 25W TDP as one running at 25W all the time.

In that case your CPU would be using 25W of power on average, which is a perfect mirror of the 25W TDP. Remember @eek2121 somehow managed to claim "TDP has nothing to do with power", when the Intel version of TDP is intrinsically linked to average power use.

SAAA said:
Now heat capacity, so how long it takes to heat up to that temperature, this could be different for a cooling solution that tolerates 40W spikes against one that keeps 25W constant.

Thermal capacity and surface area of the cooling solution are still only part of the story. In order to understand whether the cooling solution will tolerate 40W or higher, one still needs to know the maximum operating temperature of the CPU and how well heat gets transferred to the cooling solution. This 40/10W behavior could be perfectly fine for a CPU with high operating temps and problematic for another, with the same tiny heatsink. More importantly, a 25W TDP CPU will only boost to 40W as long as thermals allow, hence a smaller heatsink will lead to a different boost behavior while using the same 25W on average. TDP will still be linked to average power use.

This is the problem with using the general definition of TDP without applying specific vendor definitions and specific vendor power management rules built to enforce that definition.

In the case of Intel, TDP is defined as average power and reflected in power management as PL1 power limit. TDP is power. The required heatsink properties are derived from max temperature information (which is considerend an independent factor to TDP) as well as PL2 figures (combined with whatever PL1 Tau the OEM desires - this dictates burst performance).

In the case of AMD, TDP is defined relative to a reference heatsink, so it already takes into account temperature and thermal resistance information. This definition is indeed deviating from average power use, since one can change the TDP of the CPU by using different reference heatsink and/or max temperature parameters, even if the CPU uses the same amount of power.

IntelUser2000 · Sep 6, 2020

naukkis said:
LPDDR4 isn't only more bandwith, it's also double the channels 4x vs 2x for regular DDR4. More channels boost effective bandwith through lower latencies under high bandwith demand.

The total bandwidth at the same MT/s is the same. Why? Because each of the "quad channels" on LPDDR4/X is only 32-bits wide, while on slotted DIMMs its 64-bits. Overall the bit width is 128-bit for both.

Now designers can choose to use LPDDR4 to have a dual channel interface, but that's only equivalent to bandwidth of a single channel DIMM memory.

@coercitiv The combination of lack of knowledge among the people and the hype articles around desktop chips and TDP rating seems to be the reason people stick to the idea.

-On desktops you can sort of cheat and go over TDP since heatsinks are massive.
-On laptops though, you can't. If it goes over TDP then you pay for it either with the system throttling or needing a very beefy cooling solution.

IntelUser2000 · Sep 6, 2020

Alderlake, especially the -M variant seems to be the most mysterious part. The 2+8+2 doesn't seem competitive. Or is it?

If 2+8+2 is really the successor to Tigerlake, and considering it needed 6 cores, 2+8+2 seems like a serious downgrade.

Also unlike Lakefield, at the minimum the big cores have to work simultaneously with the small cores. This seems to be the key, and a big change for Lakefield, both in performance and in the technical front.

2+8+2 configuration with Golden Cove being 20% faster per clock and Gracemont being an additional 30% faster per clock over Tremont makes it possible in theory for 2+8+2 to beat 4+2 of Tigerlake, and by 20% or so.

We need more than that though. The same is true in desktops for Alderlake.

Is Intel going to permanently cede multi-threading performance leadership to AMD, or does it have a secret sauce?

SAAA · Sep 6, 2020

IntelUser2000 said:
Alderlake, especially the -M variant seems to be the most mysterious part. The 2+8+2 doesn't seem competitive. Or is it?

If 2+8+2 is really the successor to Tigerlake, and considering it needed 6 cores, 2+8+2 seems like a serious downgrade.

Also unlike Lakefield, at the minimum the big cores have to work simultaneously with the small cores. This seems to be the key, and a big change for Lakefield, both in performance and in the technical front.

2+8+2 configuration with Golden Cove being 20% faster per clock and Gracemont being an additional 30% faster per clock over Tremont makes it possible in theory for 2+8+2 to beat 4+2 of Tigerlake, and by 20% or so.

We need more than that though. The same is true in desktops for Alderlake.

Is Intel going to permanently cede multi-threading performance leadership to AMD, or does it have a secret sauce?

We still don't know Gracemont clocks, if they are still Atom level running at 2.5GHz even with >Skylake IPC it would be a mixed bag, on the other hand if they could actually push up to 4GHz that would make them relevant outside of purely parallel workloads.

As for the big cores consider Tiger Lake merely redesigned the memory system and with larger caches it still manages to beat Icelake by 2-5% in IPC, given Skylake-X memory changes affected performance so much over client Skylake I was worried there would be a hit here too, yet it goes faster.

Golden Cove inside Alder will perform even better, so that turns theoretically to this match:

Skylake IPC = 1.0
Tiger Lake = 1.2
Tremont* = 1.0
Alder Lake* = 1.4

(* is speculation, both most conservative estimate given the rumours floating around)

2 (cores) x 1.4 (IPC) x 4.4 (GHz)= 12.32 (performance in astrological prediction units)
+
8 x 1 x 2.5 = 20

vs

4 x 1.2 x 4 = 19.2

Under light threaded loads Alder dual will outperform Tiger Lake quads due to both higher clocks (I assume 10% at most with enhanced super fin and one year of refinements) and IPC, total performance is 12 vs 19 so a decent 65% with half the cores, hopefully using around that % of power too.

I didn't include SMT in both as I think any advantage here would be lost when Alder Lake small cores kick in: that's a theoretical boost of 20 points, so already over Tiger Lake all core performance, no SMT gains are going to reach 2x the performance hence purely parallel loads will have the next generation win.

Assuming a workload needs exactly 4 cores that would leave Alder Lake slightly behind at 17 Vs 19 points, but realistically no program does that, unless you have 4 instances open of a heavy single-threaded app.

As for the heavy weights, desktop CPUs, I'm sure with 8 Golden cove at 5 GHz no one will miss much the 10 cores Comet lake had (also there's still 8 Tremont cores):

8 x 1.4 x 5 = 56
+
8 x 1 x 2.5 = 20

vs

10 x 1 x 5.2 = 52

35% better single and 45% multithread are quite something.

IntelUser2000 · Sep 6, 2020

SAAA said:
Assuming a workload needs exactly 4 cores that would leave Alder Lake slightly behind at 17 Vs 19 points, but realistically no program does that, unless you have 4 instances open of a heavy single-threaded app.

SMT doesn't give 37% gains. It's more like 25%.

Nonetheless the competition is far more capable. That's what I'm saying.
Alderlake goes against Cezanne, which should arrive 4-5 months earlier, just like Renoir and Tigerlake.

And AMD isn't going to stay with 16 cores in desktop and 64 cores in server. 5nm should enable 24 cores and 96 cores with little impact to power consumption. 12 core mobile? Possibly. And 33% more cores a year after that?

I'm not going to use imaginary numbers. I'll use Cinebench.

In R15, Pentium Silver J5005 gets 300 points. 30% for Tremont and Gracemont gets us to 510 points. 85% for double the cores because it doesn't scale perfectly, gets us to 940 points.

Tigerlake should get ~900, or maybe even 950. 1.2x that for Golden Cove and divide that by 1.85x for half the cores. We get 620.

620 + 840 = 1660, at 25W, or about 25% better than 4800U after it hits power limits.

What about R20?

Gracemont portion = 650 x 1.3 x 1.3 x 1.85 = ~2050
Golden Cove portion = 2300/1.85 x 1.2 = ~1500
Total = 3500-3600. Renoir gets 3200 points without throttling so again Alderlake is 25% faster.

Also, a disclaimer is the above automatically assumes Golden Cove and Gracemont cores can work together. Ok, I said it probably can. Otherwise 8+8 is useless, and it would have been 4+16 instead.

Second issue is whether the performance will add perfectly without contention. What if each portion loses 10%?

Discussion Intel current and future Lakes & Rapids thread

Lifer

Diamond Member

Senior member

Lifer

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Elite Member

Senior member

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Golden Member

Diamond Member

Elite Member

Elite Member

Senior member

Elite Member