Discussion Intel current and future Lakes & Rapids thread

gdansk · Aug 19, 2021

Hougy said:
Actually, he said they cancelled 10 nm. So all 10 nm products. He even admitted he was wrong

Really, well in that case I'll take his word for it. I've been arguing for years he was right (enough) on the basis that their 10nm products were cancelled or appeared on improved processes. But from Intel's eyes it is all 1274.

jpiniero · Aug 19, 2021

gdansk said:
Really, well in that case I'll take his word for it. I've been arguing for years he was right (enough) on the basis that their 10nm products were cancelled or appeared on improved processes. But from Intel's eyes it is all 1274.

What happened is that Intel shut down nearly all of the 10 nm production lines. But not all. If Intel had not also botched 7 nm it's possible that 10 nm would be gone by now.

majord · Aug 19, 2021

Hulk said:
The Golden Cove performance improvement/architectural changes I expected. If true, the Gracemont power/performance compared to Skylake is kind of mind blowing. ST Gracemont is basically faster, more power efficient, and much smaller than Skylake. This begs the question as to why we even need the Golden Cove cores?

After thinking about it a bit I believe the reason stems from the fact that ultimately ST performance is still very important. There are many apps, like Handbrake that don't scale well beyond 8 cores. Throwing 40 Gracemont cores at that app in the same die space as Alder Lake won't show the same performance as 8+8 ADL. Partly because of the higher clocks of Golden Cove, the higher inherent IPC, and as I wrote above, there are still quite a few apps that don't handle high core counts in a linear fashion performance-wise.

Starting to get just a little more excited about ADL...

Yes , I think you've answered your own question. In summary , so it doesn't get slaughtered by the competiton in per core peak performance .

IntelUser2000 · Aug 19, 2021

ondma said:
Your point still stands but the gains in the big cores are not simply additive. The real gain is 1.18 x 1.19 or 40%. So GC is 33% faster than Gracemont. Problem with Gracemont is that it lacks hyperthreading and cant clock as high as the big cores.

I didn't add, which means you didn't compute the numbers.

(1.18 x 1.19) / 1.07 = 1.31

Addition results in 1.3. Also, Gracemont is 7-8% faster than Skylake.

Also, papers regarding architecture often just adds percentages, and it comes out to be surprisingly accurate because performance doesn't scale linearly with clock and the addition instead of multiplication seems to approximate gains pretty well.

itsmydamnation said:
faster per clock, it wont be faster 🙂
clock power scaling of gracemont will be very interesting ,
Did i see 3 cycle L1 latency ? thats gotta put a lid on clocks.

The Cove chips need to follow this path. Additional pipeline stage for Golden Cove? So it could have been an additional 2-4% faster. But up against thermal and power limits the clock gains due to a pipeline stage is a maybe, while better performance due to lower branch mispredict penalty is pretty much a guarantee. Without the extra stage we would have seen a 23% gain.

Ian thinks it'll be 17 stages for Golden Cove.

Nehalem - 16 stages
Sandy Bridge - 2 cycles longer or 18 stages, few cycles shorter with uop cache hit.
Golden Cove - 1 cycle longer or 19 stages

ARM uses the uop cache to reduce mispredict penalties. Intel uses them for higher clocks. Conroe has a 14 stage pipeline. uOP cache means Sandy Bridge can have a Conroe-like pipeline stage. Except the hit rate is something like 60%.

IntelUser2000 · Aug 19, 2021

Abwx said:
A shrinked SKL would consume about 50% lower power as well as getting 35-40% better perf at isofrequency, and 8% higher perf is 2C vs 1C/2T.

Shrink may result in 50% lower power but that won't translate into 40% better performance. Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?

Also I forgot to say Gracemont at same performance uses 40% of the power(and I changed my other post from 40% less power to 40% of the power), meaning 0.4x. That's 2.5x the gains.

8% higher perf is 2C vs 1C/2T.

Incorrect. MT results are 4C vs 2C/4T. ST results are 1C/1T for both. Gracemont is straightup a better core in every metric.

mikk · Aug 19, 2021

According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.

Thunder 57 · Aug 19, 2021

mikk said:
According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.

I am of the belief the Jim Keller is not a demigod, and that with the limited time he had at Intel there was not much influence.

scannall · Aug 19, 2021

Thunder 57 said:
I am of the belief the Jim Keller is not a demigod, and that with the limited time he had at Intel there was not much influence.

His job there was organizational and tool sets. Not design. And we see how that went.

semiman · Aug 19, 2021

Cool points to watch :

- Atom and Cove use decoders with the same width, but 2x3 vs 6
- Atom uses twice more I-cache (wtf 1)
- Atom uses much bigger execution ports (wtf 2)
- Atom uses 1/4 of die space compared to coves

What's going on here? I was surprised to see Zen 3 uses 256 reorder buffer with a smaller backend while achieving more IPC than Cypress cove. Now Intel has brought something crazy again.

insertcarehere · Aug 19, 2021

dullard said:
Exactly. User interface needs high ST performance for a fast "feel". You could have a billion slow cores, and while it will crunch numbers quickly, it would feel like a dog on the user interface thread.

If a Skylake-level core can't come back with a UI that "feels" fast enough the problem isn't the Core its the software.

While there are always applications that are tough to divide into many threads, the question becomes how to balance the trade-off between throughput, efficiency, and per-thread performance.

With a 4 core gracemont cluster about the same size as a single golden cove core, 8 big + 8 small is theroetically the same size as 6 big + 16 small. Assuming gracemont is as good as advertised the latter would have higher throughput when comparing similar power envelopes and die sizes.

diediealldie said:
Cool points to watch :

- Atom and Cove use decoders with the same width, but 2x3 vs 6
- Atom uses twice more I-cache (wtf 1)
- Atom uses much bigger execution ports (wtf 2)
- Atom uses 1/4 of die space compared to coves

What's going on here? I was surprised to see Zen 3 uses 256 reorder buffer with a smaller backend while achieving more IPC than Cypress cove. Now Intel has brought something crazy again.

Golden Cove is an iteration from the *Cove archetype but Gracemont's performance and efficiency would be the real key to making Intel competitive again, especially in the high-margin server markets where throughput and efficient designs matter more than pushing 5+ghz clocks for ultimate performance.

Hougy · Aug 19, 2021

mikk said:
According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.

I used to think MLID was a fraud, but apparently he got a few things right recently. Maybe he was a case of fake it till you make it and he got a lot better at his job

Joe NYC · Aug 19, 2021

insertcarehere said:
With a 4 core gracemont cluster about the same size as a single golden cove core, 8 big + 8 small is theroetically the same size as 6 big + 16 small. Assuming gracemont is as good as advertised the latter would have higher throughput when comparing similar power envelopes.

I was wondering the same. If it is this simple, why wait another year+, why wait another generation when Intel could take the MT performance crown now?

One explanation is that Intel is just going for gaming performance crown on desktop, low power in notebooks, and segment demanding very high MT on client side is insignificant.

dmens · Aug 20, 2021

Joe NYC said:
I was wondering the same. If it is this simple, why wait another year+, why wait another generation when Intel could take the MT performance crown now?

Because it is marketing nonsense.

Exist50 · Aug 20, 2021

dmens said:
Because it is marketing nonsense.

Lmao, you were the one claiming it was impossible for Atom to come anywhere close to Core.

semiman · Aug 20, 2021

insertcarehere said:
Golden Cove is an iteration from the *Cove archetype but Gracemont's performance would be the real key to making Intel competitive again, especially in the high-margin server markets.

Yeah, I think so too. AMD showed that the chiplet architecture is the future, so probably Intel will show people that heterogeneous architecture is the future.
I also see several different types of Amazon ec2 containers that can be served by Gracemont class cores(like for simple reverse proxying + scale-out). Kinda sad that these new architectures are considered fraud and simple market terms by some people...

Cardyak · Aug 20, 2021

Here's a quick ramble from me regarding Alderlake

https://twitter.com/x/status/1428652293440917507

coercitiv · Aug 20, 2021

IntelUser2000 said:
Shrink may result in 50% lower power but that won't translate into 40% better performance.

You're kinda' contradicting yourself here.

on one side you agree with Intel that a core can deliver 1.4x performance at 100% power while also scaling to 1x performance at 40% power
on the other side you disagree with Abwx that a core can deliver 1.35x performance at 100% power while also scaling to 1x performance at 50% power

[Later edit] For extra clarification: Intel claims 17-18% perf/watt improvement with 10SF and a further 10-15% perf/watt improvement for Intel 7. That's a combined increase of roughly 35% perf/watt improvement.

IntelUser2000 said:
Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?

The same can be said about Gracemont. Intel presented us with a very nice and marketing driven graph that makes it seem like Gracemont and Skylake will operate in the same performance range, quite similar clocks actually. Final Skylake design was 5+Ghz, Gracemont is expected to stay around ~4Ghz. That's a 25%+ delta.

Intel claim may very well be true, as long as we properly apply it within a clock range that still allows Gracemont to raise clocks over Skylake at ISO power. The same should be applied to Abwx's claim, in the sense that we would have to consider a clock range that allows "Skyalke Shrink" to clock higher at ISO power.

DrMrLordX · Aug 20, 2021

jpiniero said:
Ooh, AT is claiming that AVX-512 won't be enabled even if you turn the small cores off on Alder Lake. Marketing must have decided that this was too confusing but too late to remove it physically from the client design. (or as I mentioned before maybe only embedded customers running Linux will get access)

Perhaps Intel realized that too few client users need AVX-512 for it to make any difference.

IntelUser2000 · Aug 20, 2021

coercitiv said:
You're kinda' contradicting yourself here.

on one side you agree with Intel that a core can deliver 1.4x performance at 100% power while also scaling to 1x performance at 40% power

on the other side you disagree with Abwx that a core can deliver 1.35x performance at 100% power while also scaling to 1x performance at 50% power

Am I? His point is that the process is entirely responsible for that improvement, when in a typical full process gain, they usually quoted roughly half the power or 20% performance gains. A process that would deliver 35% performance gains would result in far below 0.5x power.

1% performance gain at iso power is worth much more than 1% power efficiency gain at iso performance.

Of course 40% greater performance and 2.5x less power use is much greater than process gains.

The same can be said about Gracemont. Intel presented us with a very nice and marketing driven graph that makes it seem like Gracemont and Skylake will operate in the same performance range, quite similar clocks actually. Final Skylake design was 5+Ghz, Gracemont is expected to stay around ~4Ghz. That's a 25%+ delta.

I actually think the peak performance part is when they are iso-clocks, because that's the only way it makes sense. Even with the advantage it's still at roughly half the power.

Of course 5GHz Skylake will be faster. But at an unreasonable power level and everyone knows that.

Alternatively, they are explicitely talking about Skylake, and it was nowhere near 5GHz. The conclusion will be the same - Gracemont has higher performance/clock and still use half the power.

“This microarchitecture delivers more general integer IPC than Intel Skylake core while consuming a fraction of the power,” said Stephen Robinson, Gracemont Chief Architect.

We're neglecting the multi-thread gains as well. 80% higher performance than the 2/4 Skylake, say a 6600U, puts us in ~600 Cinebench R15 territory. This is firmly in the Icelake territory, especially since the Atom-based cores barely throttle while Core systems are throttle-city. 4/4 "Atom" equaling a flagship 4/8 of 2 years ago.

In an unrelated note: It seems Raichu is wrong about Alderlake. Cinebench is actually a pretty decent indicator of uarch performance and to get the 810 R20 score, you need 30% better performance. Golden Cove falls quite short of this. This puts into question the whole "beating 5950X in R20" talk.

jpiniero · Aug 20, 2021

As it turns out Gracemont (and thus Alder Lake) does not support everything that Tiger Lake does. Even excluding AVX-512. The number of actual instructions missing is not that high I think though. Bunch of the AVX-512 instructions have AVX2 versions for some reason and some/most of those appear to be supported.

Abwx · Aug 20, 2021

IntelUser2000 said:
Shrink may result in 50% lower power but that won't translate into 40% better performance. Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?

3.7-3.9Ghz is undoubtly within the favourable range where power scale close to a square in respect of frequency, so at 0.5x the power you can get close to 40% better perf at isopower.

IntelUser2000 said:
Also I forgot to say Gracemont at same performance uses 40% of the power(and I changed my other post from 40% less power to 40% of the power), meaning 0.4x. That's 2.5x the gains.

Incorrect. MT results are 4C vs 2C/4T. ST results are 1C/1T for both. Gracemont is straightup a better core in every metric.

All curves for small cores are related to perf/watt, there s not a single one that do an absolute perf comparison.

Intel Alder Lake im technischen Detail

Zum Architecture Day 2021 hat Intel umfangreiche Details zur größten CPU-Neuvorstellung des Jahres geteilt: Alder Lake. Ein Überblick.

www.computerbase.de

Above it is stated that 4 GRMT have 80% better throughput than 2C/4T SKL at same power.

And here it is stated that in core per core comparison GRMT has 40% higher throughput at same power or either (more than) 40% less power or only 40% of the power (wich would mean 60% less power) at same throughput.

Now tell me how you do extract absolute perfs out of these curves...

Notice also that the bench is not Spec_int (the one used by AT for INT IPC comparisons) but Spec_rate wich take account of RAM speed and I/O improvement.

JoeRambo · Aug 20, 2021

I find it hard to be excited about Skylake level IPC CPU that is clocked 4Ghz. It starts its life with 25% clock deficit versus 2015 era Skylake and even on conservative estimates it is beaten by at least 40% by Apple 3Ghz M1 cpu core?
Even supposed area efficiency is quite questionable and should be compared to ARM X2 and not Intel's fat cores that are way overblown.

To this day i have no clue what desktop user is supposed to do with those small cores and what are the tasks that cannot be completed by 8 big cores. Unless of course the people who are trying to beat neighbor with 5950x in Cinebench (while ignoring the one with 3990X ) are the target audience?

My plan remains the same - getting Alder Lake, disabling small cores and enjoying proper CPU with ton of cache ( btw Alder Lake will be first Intel's desktop CPU that will have more L2+L3 core available for cores than AMD since ZEN2 came out).

coercitiv · Aug 20, 2021

IntelUser2000 said:
His point is that the process is entirely responsible for that improvement

I'm far more interested in his numbers than his intent. There's a big efficiency gap between Intel 7 and Intel 14nm when operating bellow 4Ghz and that needs to be acknowledged. That doesn't change other key aspects about Gracemont, such as perf/area or idle power usage. It's still a fast little buggger. (intentional typo, automated forum filter transforms it into "ah heck"?!)

Las but not least, doubling the power to increase clocks by 35% doesn't seem that improbable when looking at sub 4Ghz range. That's jumping from 3 to 4Ghz while using 100W instead of 50W. Context is crucial here, remember Intel's graph for the 40% increase in performance does not contain the 4-5Ghz range for Skylake.

jpiniero · Aug 20, 2021

IntelUser2000 said:
In an unrelated note: It seems Raichu is wrong about Alderlake. Cinebench is actually a pretty decent indicator of uarch performance and to get the 810 R20 score, you need 30% better performance. Golden Cove falls quite short of this. This puts into question the whole "beating 5950X in R20" talk.

As I mentioned the 20% includes AVX-512 regressions which R20 doesn't use. It is entirely possible that in Cinebench the "IPC" gain of the big cores is way above 20%.

Abwx · Aug 20, 2021

jpiniero said:
As I mentioned the 20% includes AVX-512 regressions which R20 doesn't use. It is entirely possible that in Cinebench the "IPC" gain of the big cores is way above 20%.

If that was the case Intel would have used it in their IPC graph.
More likely that it s related to the sample that was used for a submission (posted somewhere here by a member) and wich was apparently clocked at 5.9GHz..

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Lifer

Senior member

Elite Member

Elite Member

Diamond Member

Diamond Member

Golden Member

Member

Senior member

Member

Diamond Member

Platinum Member

Platinum Member

Member

Member

Diamond Member

Lifer

Elite Member

Lifer

Lifer

Golden Member

Diamond Member

Lifer

Lifer