Discussion Intel current and future Lakes & Rapids thread

dmens · Aug 19, 2019

Hitman928 said:
Density or power improvements for the cache redesign? I mean, is there another reason to go through the trouble?

My guess is manufacturability.

DrMrLordX · Aug 19, 2019

repoman27 said:
There is no possible universe where TGL-U/Y are not monolithic, it simply makes zero sense financially and technically for Intel to do otherwise.

Rocket Lake isn't monolithic. Intel isn't talking about the Icelake-U 4c yields but they must be awful.

There are already engineering samples of 38C 10nm ICL-SP XCC dies back from the fabs, no? (Not that they'll actually be able to yield well enough to make a profitable or even competitive product, but at least Intel wants everyone to know that they're attempting this.)

Oh, but the yields, the yields. And that product has been pushed back until well into 2020.

The only client chips we've seen on leaked roadmaps that even hinted at a multi-chip module strategy are Rocket Lake-U

Right, which is an interesting product in-and-off itself. Considering how late Rocket Lake-U shows up, I'm curious as to why Intel even needs it were monolithic Tigerlake-U yields projected to be acceptable. I mean really, if they could cut down on the iGPU of Tigerlake to strap on two more cores, then why would they even need Rocket Lake-U at all? Since it's only GT1 anyway. They wouldn't. Smart money is that Intel can't just do that. Either because their yield equation would go to pot (for whatever reason; maybe the iGPUs are more defect-resistant, though looking back at Cannonlake I doubt it) or because Intel is going to fluff up their 10nm yields some by splitting up iGPU/SoC into a separate 10nm die (like the one going onto Rocket Lake-U!) so they don't have to go full-blow monolithic for Tigerlake. If the yields on 10nm are still bad enough, there's all the incentive in the world for Intel to produce an EMIBed Tigerlake. They're already producing an EMIBed Rocket Lake-U so why not?

RKL-U is a monolithic, 14nm 6+1 die with 6 Willow Cove cores and GT1 Gen12/Xe graphics with up to 32 EUs.

Everything I've read indicates that Rocketlake isn't monolithic. 14nm Willow Cove + 10nm SoC/iGPU (gen12).

Rocket Lake will also have H and S series parts, and it looks like 6+1 and 10+1 monolithic 14nm dies are planned.

That one is going to be interesting. It's like Piednol finally got what he wanted, albeit far too late. 10c though, woof.

NostaSeronx · Aug 19, 2019

The original Willowcove is referred to as Ultra-wide OoO. Where as previous cores Skylake, Palmcove/Cannonlake, Icelake/Sunnycove are referred to as Super-wide OoO.

Just want to throw that out there again. I also went through looking at Ultrascalar processors, Ultra-wide or Extra-super-wide OoO issue superscalar, etc. Still no idea what the inclusion of Ultra means for Intel.

dmens · Aug 19, 2019

NostaSeronx said:
The original Willowcove is referred to as Ultra-wide OoO. Where as previous cores Skylake, Palmcove/Cannonlake, Icelake/Sunnycove are referred to as Super-wide OoO.

Just want to throw that out there again.

Was this an official name? That is just so embarrassing.

NostaSeronx · Aug 19, 2019

dmens said:
Was this an official name? That is just so embarrassing.

No, the reference is to its OoO execution capability.
Everything before Willowcove is super-wide OoO, everything that is Willowcove and after is ultra-wide OoO.

Skylake and beyond is a Super-wide OoO x86 core
Willowcove and beyond is a Ultra-wide OoO x86 core

Cardyak · Aug 19, 2019

NostaSeronx said:
No, the reference is to its OoO execution capability.
Everything before Willowcove is super-wide OoO, everything that is Willowcove and after is ultra-wide OoO.

Skylake and beyond is a Super-wide OoO x86 core
Willowcove and beyond is a Ultra-wide OoO x86 core

Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.

dmens · Aug 19, 2019

Cardyak said:
Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.

That would be depth, not width. I want to believe intel is so stupid to misname something so fundamental but maybe this one is too much.

NostaSeronx · Aug 19, 2019

Cardyak said:
Where did you hear this? Is the Re-Order Buffer increasing by a substantial amount again? On Sunny Cove it’s already 352.

It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

extide · Aug 19, 2019

NostaSeronx said:
My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

Geez, that would be insane!

dmens · Aug 19, 2019

NostaSeronx said:
It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

Lakes use unified RS design (going all the way back to P6), the output port count increased by 1 maybe 2, I don't remember exactly, from skylake to cove.

Unless there is a radical design of the ooo dispatch (there isn't, not in this time frame with today's intel), there is no way you will see that kind of increase in output bandwidth from the RS.

NostaSeronx · Aug 19, 2019

extide said:
Geez, that would be insane!

It doesn't actually stop there... AVX3 which is the enhanced hardware version of AVX512. Which on another profile is planned for Willowcove cores, but not Tigerlake cores.

EVEX vector length bits in the prefex;
00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)

Rather, than execution vertically; 1 instruction -> 1 512-bit datapath.
Willowcove could execute horizontally; 1 instruction -> 4 128-bit datapaths.
Which is known to reduce power consumption.

dmens said:
Lakes use unified RS design (going all the way back to P6), the output port count increased by 1 maybe 2, I don't remember exactly, from skylake to cove.

https://images.anandtech.com/doci/14514/BackEnd.jpg
Skylake has two RS by Intel slides, and Sunnycove has four by Intel slides.
https://images.anandtech.com/doci/13699/Ronak20.jpg

dmens · Aug 19, 2019

NostaSeronx said:
https://images.anandtech.com/doci/14514/BackEnd.jpg

They went to 10 from 8. So it's two more. That picture is misleading btw... the RS is segmented to four types uop types, but is not entirely separated like in Atom for example.

What would going to 16 ports do? Is the rest of the machine changed to match this bandwidth? It is a pointless change if done in isolation.

Besides, Intel isn't exactly making big changes these days. Literally holding the line and watching the barbarians storm the gate.

Cardyak · Aug 19, 2019

NostaSeronx said:
It was from a linkedin profile. Who was at AMD for Bulldozer/Piledriver and then went to work on Skylake/Cannonlake(early) and Sunnycove/Willowcove(later) at Intel. SKL/SNC/CNL, all being Super-wide OoO and WLC being Ultra-wide OoO.

Skylake -> two reservation stations; 1 computation+store data, 1 2load+1store portion
Sunnycove -> four reservation stations; 1 computation, 1 two-port store data, 2 load+store portions

My speculation is Willowcove will jump to six reservation stations;
2 computational, 2 two-port store data, 2 load+store AGU parts.
10 ports -> 16 ports

As @dmens stated earlier, I thought that width was related to the number of decode and execution units, not related to the OoO execution, Out of Order execution is purely related to the depth of the pipeline.

Regardless, I assumed that after Sunny Cove, Willow Cove was only destined to be a minor upgrade. But recent rumours from different sources around the internet point towards there being a more major increase in IPC than I first anticipated (More in line with low double-digit gains)

The widely held theory is that while Skylake has had an extended life span the design team haven’t just been resting on their laurels and instead been cranking out new architectures. I hope this is true as that could mean we are in for a backlog of steady IPC gains in the near future.

Schmide · Aug 19, 2019

NostaSeronx said:
It doesn't actually stop there... AVX3 which is the enhanced hardware version of AVX512. Which on another profile is planned for Willowcove cores, but not Tigerlake cores.

EVEX vector length bits in the prefex;
00b: 128bit (XMM)
01b: 256bit (YMM)
10b: 512bit (ZMM)

Rather, than execution vertically; 1 instruction -> 1 512-bit datapath.
Willowcove could execute horizontally; 1 instruction -> 4 128-bit datapaths.
Which is known to reduce power consumption.
https://images.anandtech.com/doci/14514/BackEnd.jpg
Skylake has two RS by Intel slides, and Sunnycove has four by Intel slides.
https://images.anandtech.com/doci/13699/Ronak20.jpg

There are already 4 lanes in avx512. (2 in avx) Saying rather than infers that there is currently a flat 512 bit lane which there isn't

NostaSeronx · Aug 19, 2019

Schmide said:
There are already 4 lanes in avx512. (2 in avx) Saying rather than infers that there is currently a flat 512 bit lane which there isn't

The execution of Intel's FPU can only operate within one mode, while utilizing all resources.

SSE128, AVX128, AVX256, AVX512 => Under the new FPU can run any and it will always be 1-to-1 with the hardware.
SSE128 on AVX512 => Bad
AVX128 on AVX512 => Bad
AVX256 on AVX512 => Bad

The next FPU uses AVX512's VL-encoding like SVE. This is further enhanced within AVX3.

dmens · Aug 19, 2019

NostaSeronx said:
The execution of Intel's FPU can only operate within one mode, while utilizing all resources.

Not true. Lanes have always been part of the AVX execution design.

Thunder 57 · Aug 19, 2019

NostaSeronx said:
The execution of Intel's FPU can only operate within one mode, while utilizing all resources.

SSE128, AVX128, AVX256, AVX512 => Under the new FPU can run any and it will always be 1-to-1 with the hardware.
SSE128 on AVX512 => Bad
AVX128 on AVX512 => Bad
AVX256 on AVX512 => Bad

The next FPU uses AVX512's VL-encoding like SVE. This is further enhanced within AVX3.

Everything I'm coming up with when I look for AVX3 turns out to be AVX512. So what exactly is this AVX3 you speak of?

Schmide · Aug 19, 2019

I think "execute horizontally" is a misleading or possibly misused term.

On Intel simd if you execute anything, sse avx avx512, it executes all lanes no matter what level you're working on. So if you do a single lane operation (sse) the upper lanes will be altered. The penalty for mixing modes is the preservation of this data.

I would guess that the next iteration would involve moving to a non destructive per lane execution (I believe like AMD) where the upper lanes would be gated/masked.

AMDK11 · Aug 20, 2019

Cardyak said:
As @dmens
The widely held theory is that while Skylake has had an extended life span the design team haven’t just been resting on their laurels and instead been cranking out new architectures. I hope this is true as that could mean we are in for a backlog of steady IPC gains in the near future.

Yes it's true. It is widely believed that after Skylake Intel barely designed SunnyCove and as you can see it has a ready x86 Wellow Cove core design for a long time and it's quite possible that Golden Cove is also at the final stage.

I am almost certain that Intel teams have long been working on new microarchitectures

coercitiv · Aug 23, 2019

Comet Lake-S "news" via VideoCardz

Intel will launch its new mainstream desktop series in Q1 2020. The Comet Lake-S, aka 10th Gen Core series, will feature up to 10 cores and 20 threads. The CPUs will be divided into three power tiers: 125W, 65W and 35W.

The new Intel CPUs will require 400-series motherboards as the socket has been changed to LGA1200. Assuming that the information provided by Xfastest is accurate, this would force a change right before switching to a smaller node. As the Comet Lake is still based on Skylake cores, meaning, it is still 14nm fabrication process. The first mainstream desktop architecture to use 10nm is believed to be Ice Lake.

Bouowmx · Aug 23, 2019

I guess it's time for western media to repost a-month-old news

jpiniero · Aug 23, 2019

So it does look like Comet U will be faster for the most part, MT perf is more determined by PL2 and the cooling.

Cinebench R15 ST scores from NBC:

1065G7: 176 (15W), 183 (25W)
10510U: 185, 189, 191
8665U: 176, 180, 185, 196
8565U: avg 170 median 175, max 193

jpiniero · Aug 24, 2019

https://www.digitimes.com/news/a20190823PD204.html

According to Digitimes, Intel says that (the real prices OEMs are paying?) are much higher for Icelake compared to Comet Lake as well.

DrMrLordX · Aug 24, 2019

jpiniero said:
According to Digitimes, Intel says that (the real prices OEMs are paying?) are much higher for Icelake compared to Comet Lake as well.

Hmm. Seems like a strange situation. I also wonder how the notebook OEMs are going to deal with all these 14nm products clogging up the channel. It could get kinda ugly.

jpiniero · Aug 24, 2019

DrMrLordX said:
Hmm. Seems like a strange situation. I also wonder how the notebook OEMs are going to deal with all these 14nm products clogging up the channel. It could get kinda ugly.

Just the opposite - I guess the point is that Icelake is extremely limited volume because of the awful yield. Tigerlake will likely just be limited.

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Golden Member

Lifer

Lifer

Lifer

Lifer