Discussion Intel current and future Lakes & Rapids thread

Page 132 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
There is no possible universe where TGL-U/Y are not monolithic, it simply makes zero sense financially and technically for Intel to do otherwise.

Rocket Lake isn't monolithic. Intel isn't talking about the Icelake-U 4c yields but they must be awful.

There are already engineering samples of 38C 10nm ICL-SP XCC dies back from the fabs, no? (Not that they'll actually be able to yield well enough to make a profitable or even competitive product, but at least Intel wants everyone to know that they're attempting this.)

Oh, but the yields, the yields. And that product has been pushed back until well into 2020.

The only client chips we've seen on leaked roadmaps that even hinted at a multi-chip module strategy are Rocket Lake-U

Right, which is an interesting product in-and-off itself. Considering how late Rocket Lake-U shows up, I'm curious as to why Intel even needs it were monolithic Tigerlake-U yields projected to be acceptable. I mean really, if they could cut down on the iGPU of Tigerlake to strap on two more cores, then why would they even need Rocket Lake-U at all? Since it's only GT1 anyway. They wouldn't. Smart money is that Intel can't just do that. Either because their yield equation would go to pot (for whatever reason; maybe the iGPUs are more defect-resistant, though looking back at Cannonlake I doubt it) or because Intel is going to fluff up their 10nm yields some by splitting up iGPU/SoC into a separate 10nm die (like the one going onto Rocket Lake-U!) so they don't have to go full-blow monolithic for Tigerlake. If the yields on 10nm are still bad enough, there's all the incentive in the world for Intel to produce an EMIBed Tigerlake. They're already producing an EMIBed Rocket Lake-U so why not?

RKL-U is a monolithic, 14nm 6+1 die with 6 Willow Cove cores and GT1 Gen12/Xe graphics with up to 32 EUs.

Everything I've read indicates that Rocketlake isn't monolithic. 14nm Willow Cove + 10nm SoC/iGPU (gen12).

Rocket Lake will also have H and S series parts, and it looks like 6+1 and 10+1 monolithic 14nm dies are planned.

That one is going to be interesting. It's like Piednol finally got what he wanted, albeit far too late. 10c though, woof.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
The original Willowcove is referred to as Ultra-wide OoO. Where as previous cores Skylake, Palmcove/Cannonlake, Icelake/Sunnycove are referred to as Super-wide OoO.

Just want to throw that out there again. I also went through looking at Ultrascalar processors, Ultra-wide or Extra-super-wide OoO issue superscalar, etc. Still no idea what the inclusion of Ultra means for Intel.
 
Last edited:

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
The original Willowcove is referred to as Ultra-wide OoO. Where as previous cores Skylake, Palmcove/Cannonlake, Icelake/Sunnycove are referred to as Super-wide OoO.

Just want to throw that out there again.


Was this an official name? That is just so embarrassing.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Was this an official name? That is just so embarrassing.
No, the reference is to its OoO execution capability.
Everything before Willowcove is super-wide OoO, everything that is Willowcove and after is ultra-wide OoO.

Skylake and beyond is a Super-wide OoO x86 core
Willowcove and beyond is a Ultra-wide OoO x86 core
 
Last edited:
  • Like
Reactions: Hans de Vries

Cardyak

Member
Sep 12, 2018
72
159
106
No, the reference is to its OoO execution capability.
Everything before Willowcove is super-wide OoO, everything that is Willowcove and after is ultra-wide OoO.

Skylake and beyond is a Super-wide OoO x86 core
Willowcove and beyond is a Ultra-wide OoO x86 core

Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.

That would be depth, not width. I want to believe intel is so stupid to misname something so fundamental but maybe this one is too much. :p
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.
It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

Lakes use unified RS design (going all the way back to P6), the output port count increased by 1 maybe 2, I don't remember exactly, from skylake to cove.

Unless there is a radical design of the ooo dispatch (there isn't, not in this time frame with today's intel), there is no way you will see that kind of increase in output bandwidth from the RS.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Geez, that would be insane!
It doesn't actually stop there... AVX3 which is the enhanced hardware version of AVX512. Which on another profile is planned for Willowcove cores, but not Tigerlake cores.

EVEX vector length bits in the prefex;
00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)

Rather, than execution vertically; 1 instruction -> 1 512-bit datapath.
Willowcove could execute horizontally; 1 instruction -> 4 128-bit datapaths.
Which is known to reduce power consumption.
Lakes use unified RS design (going all the way back to P6), the output port count increased by 1 maybe 2, I don't remember exactly, from skylake to cove.
https://images.anandtech.com/doci/14514/BackEnd.jpg
Skylake has two RS by Intel slides, and Sunnycove has four by Intel slides.
https://images.anandtech.com/doci/13699/Ronak20.jpg
 
Last edited:

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
They went to 10 from 8. So it's two more. That picture is misleading btw... the RS is segmented to four types uop types, but is not entirely separated like in Atom for example.

What would going to 16 ports do? Is the rest of the machine changed to match this bandwidth? It is a pointless change if done in isolation.

Besides, Intel isn't exactly making big changes these days. Literally holding the line and watching the barbarians storm the gate.
 

Cardyak

Member
Sep 12, 2018
72
159
106
It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

As @dmens stated earlier, I thought that width was related to the number of decode and execution units, not related to the OoO execution, Out of Order execution is purely related to the depth of the pipeline.

Regardless, I assumed that after Sunny Cove, Willow Cove was only destined to be a minor upgrade. But recent rumours from different sources around the internet point towards there being a more major increase in IPC than I first anticipated (More in line with low double-digit gains)

The widely held theory is that while Skylake has had an extended life span the design team haven’t just been resting on their laurels and instead been cranking out new architectures. I hope this is true as that could mean we are in for a backlog of steady IPC gains in the near future.
 

Schmide

Diamond Member
Mar 7, 2002
5,581
712
126
It doesn't actually stop there... AVX3 which is the enhanced hardware version of AVX512. Which on another profile is planned for Willowcove cores, but not Tigerlake cores.

EVEX vector length bits in the prefex;
00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)

Rather, than execution vertically; 1 instruction -> 1 512-bit datapath.
Willowcove could execute horizontally; 1 instruction -> 4 128-bit datapaths.
Which is known to reduce power consumption.
https://images.anandtech.com/doci/14514/BackEnd.jpg
Skylake has two RS by Intel slides, and Sunnycove has four by Intel slides.
https://images.anandtech.com/doci/13699/Ronak20.jpg

There are already 4 lanes in avx512. (2 in avx) Saying rather than infers that there is currently a flat 512 bit lane which there isn't
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
There are already 4 lanes in avx512. (2 in avx) Saying rather than infers that there is currently a flat 512 bit lane which there isn't
The execution of Intel's FPU can only operate within one mode, while utilizing all resources.

SSE128, AVX128, AVX256, AVX512 => Under the new FPU can run any and it will always be 1-to-1 with the hardware.
SSE128 on AVX512 => Bad
AVX128 on AVX512 => Bad
AVX256 on AVX512 => Bad

The next FPU uses AVX512's VL-encoding like SVE. This is further enhanced within AVX3.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
The execution of Intel's FPU can only operate within one mode, while utilizing all resources.

SSE128, AVX128, AVX256, AVX512 => Under the new FPU can run any and it will always be 1-to-1 with the hardware.
SSE128 on AVX512 => Bad
AVX128 on AVX512 => Bad
AVX256 on AVX512 => Bad

The next FPU uses AVX512's VL-encoding like SVE. This is further enhanced within AVX3.

Everything I'm coming up with when I look for AVX3 turns out to be AVX512. So what exactly is this AVX3 you speak of?
 

Schmide

Diamond Member
Mar 7, 2002
5,581
712
126
I think "execute horizontally" is a misleading or possibly misused term.

On Intel simd if you execute anything, sse avx avx512, it executes all lanes no matter what level you're working on. So if you do a single lane operation (sse) the upper lanes will be altered. The penalty for mixing modes is the preservation of this data.

I would guess that the next iteration would involve moving to a non destructive per lane execution (I believe like AMD) where the upper lanes would be gated/masked.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
205
136
116
As @dmens
The widely held theory is that while Skylake has had an extended life span the design team haven’t just been resting on their laurels and instead been cranking out new architectures. I hope this is true as that could mean we are in for a backlog of steady IPC gains in the near future.
Yes it's true. It is widely believed that after Skylake Intel barely designed SunnyCove and as you can see it has a ready x86 Wellow Cove core design for a long time and it's quite possible that Golden Cove is also at the final stage.


I am almost certain that Intel teams have long been working on new microarchitectures
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,677
136
Comet Lake-S "news" via VideoCardz
Intel will launch its new mainstream desktop series in Q1 2020. The Comet Lake-S, aka 10th Gen Core series, will feature up to 10 cores and 20 threads. The CPUs will be divided into three power tiers: 125W, 65W and 35W.

The new Intel CPUs will require 400-series motherboards as the socket has been changed to LGA1200. Assuming that the information provided by Xfastest is accurate, this would force a change right before switching to a smaller node. As the Comet Lake is still based on Skylake cores, meaning, it is still 14nm fabrication process. The first mainstream desktop architecture to use 10nm is believed to be Ice Lake.

DkE9MHH.png

TL7gm2z.png
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
So it does look like Comet U will be faster for the most part, MT perf is more determined by PL2 and the cooling.

Cinebench R15 ST scores from NBC:

1065G7: 176 (15W), 183 (25W)
10510U: 185, 189, 191
8665U: 176, 180, 185, 196
8565U: avg 170 median 175, max 193
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
According to Digitimes, Intel says that (the real prices OEMs are paying?) are much higher for Icelake compared to Comet Lake as well.

Hmm. Seems like a strange situation. I also wonder how the notebook OEMs are going to deal with all these 14nm products clogging up the channel. It could get kinda ugly.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
Hmm. Seems like a strange situation. I also wonder how the notebook OEMs are going to deal with all these 14nm products clogging up the channel. It could get kinda ugly.

Just the opposite - I guess the point is that Icelake is extremely limited volume because of the awful yield. Tigerlake will likely just be limited.