Discussion Intel current and future Lakes & Rapids thread

lobz · Mar 22, 2020

jpiniero said:
I would be surprised if Rocket Lake-S was released this year even though it's now realistic they could do it thanks to the demand destruction from the virus.

I thought I alone was thinking that.

uzzi38 · Mar 22, 2020

Hey, you guys fans of curveballs on your expectations?

Got a great one for you.

https://twitter.com/x/status/1241754120060571648

inf64 · Mar 22, 2020

It could still be Sunny Cove on 14nm, this could get them a nice uplift in ST performance provided that the clocks do not go significantly down (more than 10%). I still have no idea how intel thinks that part will be able to compete with Zen 3 desktop parts though. The added bonus of GPU is nice but didn't do much for them in 2019 when they lost share in desktop , plus AMD will have a native 8C/16T Zen3 based APU with RDNA2 in early 2021 so this advantage will be nullified.

naukkis · Mar 22, 2020

inf64 said:
It could still be Sunny Cove on 14nm, this could get them a nice uplift in ST performance provided that the clocks do not go significantly down (more than 10%). I still have no idea how intel thinks that part will be able to compete with Zen 3 desktop parts though. The added bonus of GPU is nice but didn't do much for them in 2019 when they lost share in desktop , plus AMD will have a native 8C/16T Zen3 based APU with RDNA2 in early 2021 so this advantage will be nullified.

About zero change for that. Sunny Cove is massive on 10nm, it would be enormous on 14nm. They would need either to clock it very low or add another pipeline stages just to transfer data around core - resulting lower IPC.

Probably first thing engineers get when start to design cpu core is transistor budget available for target manufacturing process - they just can't backport big arch designed for denser process to some less dense process. Other way around is possible so that RocketLake cpu design could be ported to 10nn.....

jpiniero · Mar 22, 2020

uzzi38 said:
Hey, you guys fans of curveballs on your expectations?

Got a great one for you.

I had thought they might do that instead but I would think a straight port of Willow Cove would be easier than designing some bastard inbetween version. The power consumption should be very high yes.

uzzi38 · Mar 22, 2020

inf64 said:
plus AMD will have a native 8C/16T Zen3 based APU with RDNA2 in early 2021 so this advantage will be nullified.

Intel wishes that was the only RDNA2 APU in the near future.

ondma · Mar 22, 2020

RetroZombie said:
Well actually no.
With that space you can have another skylake core so, by your math 8 core skylake cpus use the same space of 6 sunny cove cpus, so the performance increase is actually negative.

Too much concentration on single core cpus in 2020 leave the market wide open for others to succeed.

Single core cpus????

Ajay · Mar 22, 2020

Spartak said:
It's relevant in context to the claim "Sunny Cove (architecture) is ridiculously complex".

If we compare Sunny Cove to Skylake we see 18-19% extra IPC performance for 38% more transistors (217M vs ~300M), so it's a really good performance increase for the extra transistors. Not ridiculous complex.
If we compare Sunny Cove to Zen2 we see about 9% better IPC against 33% less transistors (~300M vs ~400M). Not ridiculous complex either.

Then there's the distinction between complexity and transistor count. More complexity enables less transistors for similar performance, which is opposite to what people here seem to assume. It's a trade-off that I think most if not all CPU manufacturers would prefer, as it generally means less die-space on the same process for similar performance.

Hmm, I’m sure this has nothing to do with how completely Intel 10 no sucks. We have no visibility into what Intel had to do vis-a-vis the design implementation to get the damn CPUs to work at anywhere near decent clocks. The whole 10nm product stack for cpus is a disaster just based on the monumental screw up that P1274 was.

lobz · Mar 22, 2020

@witeken about Rocket Lake on Twitter:

https://twitter.com/x/status/1241690622832119808

"AVX-512, +20-30% IPC and >5.0GHz will surely steamroll Ryzen."

This guy is golden

Markfw · Mar 22, 2020

lobz said:
@witeken about Rocket Lake on Twitter:

https://twitter.com/x/status/1241690622832119808

"AVX-512, +20-30% IPC and >5.0GHz will surely steamroll Ryzen."

This guy is golden

Yea, I will believe it when I see Anandtech (or Toms or the like) review it. No leaks, no opinions will sway me.

RetroZombie · Mar 22, 2020

ondma said:
Single core cpus????

Hey it's not me who is measuring the size of a single cpu core without taking into account the size of a group of cpu cores and measuring the rest of the package.

inf64 · Mar 22, 2020

lobz said:
@witeken about Rocket Lake on Twitter:

https://twitter.com/x/status/1241690622832119808

"AVX-512, +20-30% IPC and >5.0GHz will surely steamroll Ryzen."

This guy is golden

It's like he ate some bad shrooms

, crazy tripping

Markfw · Mar 22, 2020

inf64 said:
It's like he ate some bad shrooms , crazy tripping

I don't usually, but I went to twitter and read all the replies. Your post is a nice summation of those.

"when pigs fly... without a plane"

IntelUser2000 · Mar 22, 2020

naukkis said:
If we compare core transistor count with L3 cache things won't look so radical. 2MB of L3 takes 175M transistors, so after declining that lefts 40M transistors for Skylake, 50M for Zen2 and 125M for Sunny Cove. L3 cache isn't problem, it packs lots of transistors in small area and won't consume much power.

Keller said core transistors so you don't have to take out L3 cache numbers.

And second you are overestimating the transistor count for the caches. Even with 8T and 1 extra bit for ECC, it still is under 150 million. They are likely still using 6T SRAM(8T is for L2) for L3 caches.

Your estimation of 40 million transistors for Skylake is so off I suggest you recheck your data. That number for the core only existed in the early Pentium 4 days. Even Prescott beats that figure by a wide margin, nevermind far more complex uarchs of today! 6T SRAM with 1 ECC is only 120 million transistors.

Another counter argument is that you think the number is including the L3. Even with 120 million transistors for the L3 cache, that means only 90 million for Skylake! Again that's 2004 numbers!

inf64 said:
It could still be Sunny Cove on 14nm, this could get them a nice uplift in ST performance provided that the clocks do not go significantly down (more than 10%).

I bet you they are very well aware. They are ceding some parts of the desktop market until the whole division(process/uarch) stabilizes.

The heavy content creator/video editing/rendering, etc crowd is going to be Ryzen, but they can use SKU-plays to aim the greater desktop market. Oh, and gaming with faster core and higher frequencies.

We already knew from the leaks the Xe variant in Rocketlake is a 32EU part and its mostly feature enable thing than anything.

@mikk It doesn't rule out chiplet at all. We basically knew about it from the same sources that leaked Rocketlake info. 14+14 or 14+10 doesn't leave much room for imagination.

mikk · Mar 22, 2020

IntelUser2000 said:
@mikk It doesn't rule out chiplet at all. We basically knew about it from the same sources that leaked Rocketlake info. 14+14 or 14+10 doesn't leave much room for imagination.

You refer to the tweakers roadmap, RKL-S was plain 14nm in there. I wonder how accurate it was in some details. The roadmap claimed RKL-S gets 10C, all the recent leaks suggests RKL-S gets 8C which makes more sense for a backport because Tigerlake-H is a 8+1 variant. RKL-S definitely won't get Skylake cores, the chiplet theory makes no sense and there is no indication for it. Haven't heard much about RKL-U lately. There was a 4C/6C RKL-U in this roadmap, they should bring out a 6C Tigerlake-U variant instead. Not this year but for a Tigerlake-U refresh next year, because I doubt ADL-P is ready next year.

naukkis · Mar 23, 2020

IntelUser2000 said:
Keller said core transistors so you don't have to take out L3 cache numbers.

And second you are overestimating the transistor count for the caches. Even with 8T and 1 extra bit for ECC, it still is under 150 million. They are likely still using 6T SRAM(8T is for L2) for L3 caches.

There's also 48 bits of tags for every 64byte(512bits) of cacheline memory.

Your estimation of 40 million transistors for Skylake is so off I suggest you recheck your data. That number for the core only existed in the early Pentium 4 days. Even Prescott beats that figure by a wide margin, nevermind far more complex uarchs of today! 6T SRAM with 1 ECC is only 120 million transistors.

Another counter argument is that you think the number is including the L3. Even with 120 million transistors for the L3 cache, that means only 90 million for Skylake! Again that's 2004 numbers!

That's what Intel gives you, every two cores with 4MB L3 added for Skylake adds that 435 or so million transistor to their reported die transistor counts, that's where that 217 million core transistors figure come from. It seems lowish for core, but every switching transistor uses power and if they don't limit logic switching transistors power usage will skyrocket. Like with Sunny Cove.....

rainy · Mar 23, 2020

lobz said:
"AVX-512, +20-30% IPC and >5.0GHz will surely steamroll Ryzen."

This guy is golden

He clearly must be taking a pretty strong blue pill./s

Btw, on more serious note, I remember well his hilariously bad/wrong predictions about Intel's processes.

OriAr · Mar 24, 2020

naukkis said:
There's also 48 bits of tags for every 64byte(512bits) of cacheline memory.

That's what Intel gives you, every two cores with 4MB L3 added for Skylake adds that 435 or so million transistor to their reported die transistor counts, that's where that 217 million core transistors figure come from. It seems lowish for core, but every switching transistor uses power and if they don't limit logic switching transistors power usage will skyrocket. Like with Sunny Cove.....

Skylake has die size of 8.7 mm^2, and maybe a quarter of it is cache, so logic takes up around 6.5 mm^2. Even assuming low density of 15MT/mm^2, that's still way more than 40MT, more like about 100MT, and since the density for the core seems to be about 24MT/mm^2, it brings up the count to about 156 million transistors, way more than just 40 million.

geegee83 · Mar 24, 2020

rainy said:
He clearly must be taking a pretty strong blue pill./s

Btw, on more serious note, I remember well his hilariously bad/wrong predictions about Intel's processes.

Taking away his exaggeration, what’s the more reasonable performance estimate?

Also what’s the delta to Zen3....he seems to be comparing to Zen2 which is not the right comparison.

uzzi38 · Mar 24, 2020

geegee83 said:
Taking away his exaggeration, what’s the more reasonable performance estimate?

Also what’s the delta to Zen3....he seems to be comparing to Zen2 which is not the right comparison.

Actually reasonable claims involve throwing away the possibility of a complete or even close to complete backport out of the window, and if that weren't enough, also throwing some clocks out too.

The claim that "14nm clocks better than 10nm, so Rocket Lake can simultanously be both a complete (or even close to complete) backport and superior clocks to 10nm products" severely lacks fundamental understanding of how CPUs are designed.

There's no easier way for me to put it.

Exist50 · Mar 24, 2020

uzzi38 said:
Actually reasonable claims involve throwing away the possibility of a complete or even close to complete backport out of the window, and if that weren't enough, also throwing some clocks out too.

The claim that "14nm clocks better than 10nm, so Rocket Lake can simultanously be both a complete (or even close to complete) backport and superior clocks to 10nm products" severely lacks fundamental understanding of how CPUs are designed.

There's no easier way for me to put it.

uzzi, you're often quite sensible, but I'm not sure what you think is fundamentally missing here. Yes, I think witeken's prediction is wildly optimistic, but fundamentally 10nm hasn't seemed to provide any inherent improvement in clock speed, and if you ignore power consumption and die size, there's not all that much more to it.

jpiniero · Mar 24, 2020

uzzi38 said:
Actually reasonable claims involve throwing away the possibility of a complete or even close to complete backport out of the window, and if that weren't enough, also throwing some clocks out too.

Obviously Intel feels like they can do it, or they wouldn't have bothered.

Now I wouldn't be surprised if they don't end up releasing it due to the virus.

yeshua · Mar 24, 2020

Intel has released three weird "Off Roadmap" (what does it even mean?) Ice Lake CPUs:

Intel® Core™ i7-1060NG7
https://ark.intel.com/content/www/u...060ng7-processor-8m-cache-up-to-3-80-ghz.html
Intel® Core™ i5-1030NG7
https://ark.intel.com/content/www/u...030ng7-processor-6m-cache-up-to-3-50-ghz.html
Intel® Core™ i3-1000NG4

https://ark.intel.com/content/www/u...000ng4-processor-4m-cache-up-to-3-20-ghz.html

Looks like they are bespoke CPUs for Apple.

They have also finally released the remaining ULV Ice Lake CPUs but the Core i7 1068-G7 is still nowhere to be seen. Weird.

uzzi38 · Mar 24, 2020

Exist50 said:
uzzi, you're often quite sensible, but I'm not sure what you think is fundamentally missing here. Yes, I think witeken's prediction is wildly optimistic, but fundamentally 10nm hasn't seemed to provide any inherent improvement in clock speed, and if you ignore power consumption and die size, there's not all that much more to it.

Alright, fine then. I can give a more in-depth explanation, but not a full depth rundown. I don't even know if giving this much info is fine, I'm just going to hope it's fine, because the last thing I want is a friend of mine getting in trouble. And more than anything else, I'm sick to death of hearing the 5GHz Willow Cove on 14nm thing.

Some of this info will actually contradict with what I've said in the past.... or at least seem that way at first. Please read through the full post first before jumping to conclusions.

So as I'm sure all are aware, your average CPU is designed with dozens, if not hundreds of IPs put together. Well, if you wanted to do a straight backport, it would take you between 2-6 months, depending on the amount of IP you have to work with, the number of timings you'd have to rework, that kinda thing. But, if you did that... well you'd end up with an absolutely atrocious product.

So if you want to try and get somethign usable, then you need to do some reworking of those IPs. Each of them will be validated for the node their based on, for different degrees of clocks, power, area, durability (not sure if this is the correct word, but I'm sure you get the jist) to the node you want them to be on. So you'll be playing around with all 4 of those variables to get something you can work with.

But you see, when backporting this IP to another node directly, it won't meet those same targets as the original node even in a best case scenario. There would be a serious deficit in those four categories compared to the original node, and I'm not even talking about something just a straight reduction in density would be able to solve.. You first need to find a balance between those above things on the new node, but this is complicated by the fact that different nodes are specced to run a different number of logic levels even at the same frequency. To roughly quote - they'd have to do a significant amount of deep re-workings of the IPs provided they wanted to get that IP to run at the same frequency as 10nm products without something atroocious in the other categories. This is what could take dozens of months to complete.

Even by the end of all this, you'd end up with a product that would require obscene amounts of power just to try and sustain clocks around 4GHz - in fact, to quoite them specifically: "With the time this would take to lower then intended frequency an terrible power ( seeing power would be an issue even at a very "low" frequency -> you don't run a cpu at +4GHz consistently for no good reason ) they should screw it and just go samsung / tsmc."

And well, I became certain we weren't looking at a direct backport the second D0cTB said Rocket Lake started actual devlopment in 2019 just a couple of days ago. They don't have the time for a complete backport.

This is why I wouldn't suggest you believe the full backport rumours so easily. I won't claim to know what it is directly (though I know a couple of people who were told by the bunny in DMs why they should also thing RKL-S isn't a backport, they didn't want to share and I don't plan on probing them for info), but I can - with confidence - say it is not a full backport. A full backport would probably be outperformed by Comet Lake.

I know there a lot of other people here who think of me in a not so positive way (some of them banning me on Twitter before I even say anything to them there LOL) but I'm not the kind of person that fully hops onto any rumour without proof. Just know from this I don't ever regurgitate any rumours without having a damned good reason to believe in them myself

EDIT: Right now, the only person I'd trust to be giving us a correct hint into the nature of Rocket Lake is D0cTB. He's well known in the French tech press and is known to have gotten engineering samples of previous chips through unconventional ways. And so far, he's said the following:

https://twitter.com/x/status/1241754120060571648

https://twitter.com/x/status/1241751679269249025

https://twitter.com/d0cTB/status/1241696134600568834

Exist50 · Mar 24, 2020

uzzi38 said:
Alright, fine then. I can give a more in-depth explanation, but not a full depth rundown. I don't even know if giving this much info is fine, I'm just going to hope it's fine, because the last thing I want is a friend of mine getting in trouble. And more than anything else, I'm sick to death of hearing the 5GHz Willow Cove on 14nm thing.

Some of this info will actually contradict with what I've said in the past.... or at least seem that way at first. Please read through the full post first before jumping to conclusions.

So as I'm sure all are aware, your average CPU is designed with dozens, if not hundreds of IPs put together. Well, if you wanted to do a straight backport, it would take you between 2-6 months, depending on the amount of IP you have to work with, the number of timings you'd have to rework, that kinda thing. But, if you did that... well you'd end up with an absolutely atrocious product.

So if you want to try and get somethign usable, then you need to do some reworking of those IPs. Each of them will be validated for the node their based on, for different degrees of clocks, power, area, durability (not sure if this is the correct word, but I'm sure you get the jist) to the node you want them to be on. So you'll be playing around with all 4 of those variables to get something you can work with.

But you see, when backporting this IP to another node directly, it won't meet those same targets as the original node even in a best case scenario. There would be a serious deficit in those four categories compared to the original node, and I'm not even talking about something just a straight reduction in density would be able to solve.. You first need to find a balance between those above things on the new node, but this is complicated by the fact that different nodes are specced to run a different number of logic levels even at the same frequency. To roughly quote - they'd have to do a significant amount of deep re-workings of the IPs provided they wanted to get that IP to run at the same frequency as 10nm products without something atroocious in the other categories. This is what could take dozens of months to complete.

Even by the end of all this, you'd end up with a product that would require obscene amounts of power just to try and sustain clocks around 4GHz - in fact, to quoite them specifically: "With the time this would take to lower then intended frequency an terrible power ( seeing power would be an issue even at a very "low" frequency -> you don't run a cpu at +4GHz consistently for no good reason ) they should screw it and just go samsung / tsmc."

And well, I became certain we weren't looking at a direct backport the second D0cTB said Rocket Lake started actual devlopment in 2019 just a couple of days ago. They don't have the time for a complete backport.

This is why I wouldn't suggest you believe the full backport rumours so easily. I won't claim to know what it is directly (though I know a couple of people who were told by the bunny in DMs why they should also thing RKL-S isn't a backport, they didn't want to share and I don't plan on probing them for info), but I can - with confidence - say it is not a full backport. A full backport would probably be outperformed by Comet Lake.

Let me go through this point by point.

uzzi38 said:
So as I'm sure all are aware, your average CPU is designed with dozens, if not hundreds of IPs put together. Well, if you wanted to do a straight backport, it would take you between 2-6 months, depending on the amount of IP you have to work with, the number of timings you'd have to rework, that kinda thing.

Very true. I'm with you so far.

uzzi38 said:
But, if you did that... well you'd end up with an absolutely atrocious product.

And this is where you start losing me. There's nothing inherent about backporting that makes the IP worse, beyond whatever differences there are in the fab process.

uzzi38 said:
So if you want to try and get somethign usable, then you need to do some reworking of those IPs. Each of them will be validated for the node their based on, for different degrees of clocks, power, area, durability

There's truth in this, but it's being misconstrued. Yes, you need to do some backend rework, but you've significantly overstating the amount. The majority of IPs will be delivered as soft IPs - in other words, synthesizable logic that APR tools can handle without manual intervention. Typically these IPs are then "hardened" with some backend/SD work to improve their performance characteristics, but the difference isn't extreme. The exceptions are "hard IPs" like Intel's current cores and analog logic like PHYs, FIVR, etc. These are not synthesizable, and tied directly to the process. These too can be backported, but there's significantly more effort involved, and in the worst case it can resemble most of a redesign than a port. Still not impossible, however, if you're willing to throw the time, and manpower into it.

uzzi38 said:
But you see, when backporting this IP to another node directly, it won't meet those same targets as the original node even in a best case scenario

You're too vague here. You seem to imply characteristics of the node that somehow transcend the transistors and low level building blocks, but that's really not the case. I would ask that you reevaluate your source for this impression.

uzzi38 said:
Even by the end of all this, you'd end up with a product that would require obscene amounts of power just to try and sustain clocks around 4GHz

There would be a power penalty, surely, but again, this greatly exaggerates it. It's no secret that Cdyn scaling between nodes has slowed in recent years, and that is particularly apparent for 10nm. I imagine the penalty would be in the ballpark of 25% more power (roughly in line with the core difference between Rocket Lake and Comet Lake), not something obscene like 100%.

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Platinum Member

Diamond Member

Senior member

Lifer

Platinum Member

Platinum Member

Lifer

Platinum Member

Moderator Emeritus, Elite Member

Senior member

Diamond Member

Moderator Emeritus, Elite Member

Elite Member

Diamond Member

Senior member

Senior member

Member

Junior Member

Platinum Member

Platinum Member

Lifer

Member

Platinum Member

Platinum Member