Discussion Intel current and future Lakes & Rapids thread

Page 498 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Diamond Member
Feb 6, 2011
3,044
3,831
136
We already knew everything that Intel announced today. It is just that a lot of people did not believe the rumors/leaks. I remember getting yelled at for saying that gracemont would be faster than skylake…
faster per clock, it wont be faster :)
clock power scaling of gracemont will be very interesting ,
Did i see 3 cycle L1 latency ? thats gotta put a lid on clocks.
 

gdansk

Diamond Member
Feb 8, 2011
4,205
7,049
136
Actually, he said they cancelled 10 nm. So all 10 nm products. He even admitted he was wrong
Really, well in that case I'll take his word for it. I've been arguing for years he was right (enough) on the basis that their 10nm products were cancelled or appeared on improved processes. But from Intel's eyes it is all 1274.
 

jpiniero

Lifer
Oct 1, 2010
16,491
6,983
136
Really, well in that case I'll take his word for it. I've been arguing for years he was right (enough) on the basis that their 10nm products were cancelled or appeared on improved processes. But from Intel's eyes it is all 1274.

What happened is that Intel shut down nearly all of the 10 nm production lines. But not all. If Intel had not also botched 7 nm it's possible that 10 nm would be gone by now.
 

majord

Senior member
Jul 26, 2015
509
710
136
The Golden Cove performance improvement/architectural changes I expected. If true, the Gracemont power/performance compared to Skylake is kind of mind blowing. ST Gracemont is basically faster, more power efficient, and much smaller than Skylake. This begs the question as to why we even need the Golden Cove cores?

After thinking about it a bit I believe the reason stems from the fact that ultimately ST performance is still very important. There are many apps, like Handbrake that don't scale well beyond 8 cores. Throwing 40 Gracemont cores at that app in the same die space as Alder Lake won't show the same performance as 8+8 ADL. Partly because of the higher clocks of Golden Cove, the higher inherent IPC, and as I wrote above, there are still quite a few apps that don't handle high core counts in a linear fashion performance-wise.

Starting to get just a little more excited about ADL...

Yes , I think you've answered your own question. In summary , so it doesn't get slaughtered by the competiton in per core peak performance .
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Your point still stands but the gains in the big cores are not simply additive. The real gain is 1.18 x 1.19 or 40%. So GC is 33% faster than Gracemont. Problem with Gracemont is that it lacks hyperthreading and cant clock as high as the big cores.

I didn't add, which means you didn't compute the numbers.

(1.18 x 1.19) / 1.07 = 1.31

Addition results in 1.3. Also, Gracemont is 7-8% faster than Skylake.

Also, papers regarding architecture often just adds percentages, and it comes out to be surprisingly accurate because performance doesn't scale linearly with clock and the addition instead of multiplication seems to approximate gains pretty well.

faster per clock, it wont be faster :)
clock power scaling of gracemont will be very interesting ,
Did i see 3 cycle L1 latency ? thats gotta put a lid on clocks.

The Cove chips need to follow this path. Additional pipeline stage for Golden Cove? So it could have been an additional 2-4% faster. But up against thermal and power limits the clock gains due to a pipeline stage is a maybe, while better performance due to lower branch mispredict penalty is pretty much a guarantee. Without the extra stage we would have seen a 23% gain.

Ian thinks it'll be 17 stages for Golden Cove.

Nehalem - 16 stages
Sandy Bridge - 2 cycles longer or 18 stages, few cycles shorter with uop cache hit.
Golden Cove - 1 cycle longer or 19 stages

ARM uses the uop cache to reduce mispredict penalties. Intel uses them for higher clocks. Conroe has a 14 stage pipeline. uOP cache means Sandy Bridge can have a Conroe-like pipeline stage. Except the hit rate is something like 60%.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
A shrinked SKL would consume about 50% lower power as well as getting 35-40% better perf at isofrequency, and 8% higher perf is 2C vs 1C/2T.

Shrink may result in 50% lower power but that won't translate into 40% better performance. Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?

Also I forgot to say Gracemont at same performance uses 40% of the power(and I changed my other post from 40% less power to 40% of the power), meaning 0.4x. That's 2.5x the gains.

8% higher perf is 2C vs 1C/2T.

Incorrect. MT results are 4C vs 2C/4T. ST results are 1C/1T for both. Gracemont is straightup a better core in every metric.
 

mikk

Diamond Member
May 15, 2012
4,291
2,381
136
According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.


 
  • Wow
Reactions: Mopetar and Grazick

Thunder 57

Diamond Member
Aug 19, 2007
3,805
6,413
136
According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.



I am of the belief the Jim Keller is not a demigod, and that with the limited time he had at Intel there was not much influence.
 

semiman

Member
May 9, 2020
77
68
91
Cool points to watch :

- Atom and Cove use decoders with the same width, but 2x3 vs 6
- Atom uses twice more I-cache (wtf 1)
- Atom uses much bigger execution ports (wtf 2)
- Atom uses 1/4 of die space compared to coves

What's going on here? I was surprised to see Zen 3 uses 256 reorder buffer with a smaller backend while achieving more IPC than Cypress cove. Now Intel has brought something crazy again.
 

insertcarehere

Senior member
Jan 17, 2013
712
701
136
Exactly. User interface needs high ST performance for a fast "feel". You could have a billion slow cores, and while it will crunch numbers quickly, it would feel like a dog on the user interface thread.

If a Skylake-level core can't come back with a UI that "feels" fast enough the problem isn't the Core its the software.

While there are always applications that are tough to divide into many threads, the question becomes how to balance the trade-off between throughput, efficiency, and per-thread performance.

With a 4 core gracemont cluster about the same size as a single golden cove core, 8 big + 8 small is theroetically the same size as 6 big + 16 small. Assuming gracemont is as good as advertised the latter would have higher throughput when comparing similar power envelopes and die sizes.

Cool points to watch :

- Atom and Cove use decoders with the same width, but 2x3 vs 6
- Atom uses twice more I-cache (wtf 1)
- Atom uses much bigger execution ports (wtf 2)
- Atom uses 1/4 of die space compared to coves

What's going on here? I was surprised to see Zen 3 uses 256 reorder buffer with a smaller backend while achieving more IPC than Cypress cove. Now Intel has brought something crazy again.

Golden Cove is an iteration from the *Cove archetype but Gracemont's performance and efficiency would be the real key to making Intel competitive again, especially in the high-margin server markets where throughput and efficient designs matter more than pushing 5+ghz clocks for ultimate performance.
 
Last edited:

Hougy

Member
Jan 13, 2021
81
62
91
According to Moore's Law Is Dead Jim Keller worked on a CPU architecture called Royal Core. It's not clear if it will be introduced in Lunar Lake or Nova Lake. Initial plan was Lunar Lake but the full Royal Core might come with Nova Lake. Lunar Lake aims to bring at least 30% better IPC over Meteor Lake.


I used to think MLID was a fraud, but apparently he got a few things right recently. Maybe he was a case of fake it till you make it and he got a lot better at his job
 

Joe NYC

Diamond Member
Jun 26, 2021
3,238
4,737
136
With a 4 core gracemont cluster about the same size as a single golden cove core, 8 big + 8 small is theroetically the same size as 6 big + 16 small. Assuming gracemont is as good as advertised the latter would have higher throughput when comparing similar power envelopes.

I was wondering the same. If it is this simple, why wait another year+, why wait another generation when Intel could take the MT performance crown now?

One explanation is that Intel is just going for gaming performance crown on desktop, low power in notebooks, and segment demanding very high MT on client side is insignificant.
 

semiman

Member
May 9, 2020
77
68
91
Golden Cove is an iteration from the *Cove archetype but Gracemont's performance would be the real key to making Intel competitive again, especially in the high-margin server markets.

Yeah, I think so too. AMD showed that the chiplet architecture is the future, so probably Intel will show people that heterogeneous architecture is the future.
I also see several different types of Amazon ec2 containers that can be served by Gracemont class cores(like for simple reverse proxying + scale-out). Kinda sad that these new architectures are considered fraud and simple market terms by some people...
 

coercitiv

Diamond Member
Jan 24, 2014
7,225
16,982
136
Shrink may result in 50% lower power but that won't translate into 40% better performance.
You're kinda' contradicting yourself here.
  • on one side you agree with Intel that a core can deliver 1.4x performance at 100% power while also scaling to 1x performance at 40% power
  • on the other side you disagree with Abwx that a core can deliver 1.35x performance at 100% power while also scaling to 1x performance at 50% power
[Later edit] For extra clarification: Intel claims 17-18% perf/watt improvement with 10SF and a further 10-15% perf/watt improvement for Intel 7. That's a combined increase of roughly 35% perf/watt improvement.

Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?
The same can be said about Gracemont. Intel presented us with a very nice and marketing driven graph that makes it seem like Gracemont and Skylake will operate in the same performance range, quite similar clocks actually. Final Skylake design was 5+Ghz, Gracemont is expected to stay around ~4Ghz. That's a 25%+ delta.

Intel claim may very well be true, as long as we properly apply it within a clock range that still allows Gracemont to raise clocks over Skylake at ISO power. The same should be applied to Abwx's claim, in the sense that we would have to consider a clock range that allows "Skyalke Shrink" to clock higher at ISO power.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,696
12,650
136
Ooh, AT is claiming that AVX-512 won't be enabled even if you turn the small cores off on Alder Lake. Marketing must have decided that this was too confusing but too late to remove it physically from the client design. (or as I mentioned before maybe only embedded customers running Linux will get access)

Perhaps Intel realized that too few client users need AVX-512 for it to make any difference.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
You're kinda' contradicting yourself here.
  • on one side you agree with Intel that a core can deliver 1.4x performance at 100% power while also scaling to 1x performance at 40% power
  • on the other side you disagree with Abwx that a core can deliver 1.35x performance at 100% power while also scaling to 1x performance at 50% power

Am I? His point is that the process is entirely responsible for that improvement, when in a typical full process gain, they usually quoted roughly half the power or 20% performance gains. A process that would deliver 35% performance gains would result in far below 0.5x power.

1% performance gain at iso power is worth much more than 1% power efficiency gain at iso performance.

Of course 40% greater performance and 2.5x less power use is much greater than process gains.

The same can be said about Gracemont. Intel presented us with a very nice and marketing driven graph that makes it seem like Gracemont and Skylake will operate in the same performance range, quite similar clocks actually. Final Skylake design was 5+Ghz, Gracemont is expected to stay around ~4Ghz. That's a 25%+ delta.

I actually think the peak performance part is when they are iso-clocks, because that's the only way it makes sense. Even with the advantage it's still at roughly half the power.

Of course 5GHz Skylake will be faster. But at an unreasonable power level and everyone knows that.

Alternatively, they are explicitely talking about Skylake, and it was nowhere near 5GHz. The conclusion will be the same - Gracemont has higher performance/clock and still use half the power.

“This microarchitecture delivers more general integer IPC than Intel Skylake core while consuming a fraction of the power,” said Stephen Robinson, Gracemont Chief Architect.

We're neglecting the multi-thread gains as well. 80% higher performance than the 2/4 Skylake, say a 6600U, puts us in ~600 Cinebench R15 territory. This is firmly in the Icelake territory, especially since the Atom-based cores barely throttle while Core systems are throttle-city. 4/4 "Atom" equaling a flagship 4/8 of 2 years ago.

In an unrelated note: It seems Raichu is wrong about Alderlake. Cinebench is actually a pretty decent indicator of uarch performance and to get the 810 R20 score, you need 30% better performance. Golden Cove falls quite short of this. This puts into question the whole "beating 5950X in R20" talk.
 
Last edited:
  • Like
Reactions: geegee83

jpiniero

Lifer
Oct 1, 2010
16,491
6,983
136
As it turns out Gracemont (and thus Alder Lake) does not support everything that Tiger Lake does. Even excluding AVX-512. The number of actual instructions missing is not that high I think though. Bunch of the AVX-512 instructions have AVX2 versions for some reason and some/most of those appear to be supported.
 

Abwx

Lifer
Apr 2, 2011
11,835
4,789
136
Shrink may result in 50% lower power but that won't translate into 40% better performance. Actually the gains will be closer to 20%, assuming you can translate that into clocks. Otherwise it drops. 15%? 10%? 5%?

3.7-3.9Ghz is undoubtly within the favourable range where power scale close to a square in respect of frequency, so at 0.5x the power you can get close to 40% better perf at isopower.


Also I forgot to say Gracemont at same performance uses 40% of the power(and I changed my other post from 40% less power to 40% of the power), meaning 0.4x. That's 2.5x the gains.



Incorrect. MT results are 4C vs 2C/4T. ST results are 1C/1T for both. Gracemont is straightup a better core in every metric.

All curves for small cores are related to perf/watt, there s not a single one that do an absolute perf comparison.


12-630.d35a4a43.png


Above it is stated that 4 GRMT have 80% better throughput than 2C/4T SKL at same power.


11-630.d973fdf1.png


And here it is stated that in core per core comparison GRMT has 40% higher throughput at same power or either (more than) 40% less power or only 40% of the power (wich would mean 60% less power) at same throughput.

Now tell me how you do extract absolute perfs out of these curves...

Notice also that the bench is not Spec_int (the one used by AT for INT IPC comparisons) but Spec_rate wich take account of RAM speed and I/O improvement.
 
  • Like
Reactions: Tlh97 and Lodix

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I find it hard to be excited about Skylake level IPC CPU that is clocked 4Ghz. It starts its life with 25% clock deficit versus 2015 era Skylake and even on conservative estimates it is beaten by at least 40% by Apple 3Ghz M1 cpu core?
Even supposed area efficiency is quite questionable and should be compared to ARM X2 and not Intel's fat cores that are way overblown.

To this day i have no clue what desktop user is supposed to do with those small cores and what are the tasks that cannot be completed by 8 big cores. Unless of course the people who are trying to beat neighbor with 5950x in Cinebench (while ignoring the one with 3990X ) are the target audience?

My plan remains the same - getting Alder Lake, disabling small cores and enjoying proper CPU with ton of cache ( btw Alder Lake will be first Intel's desktop CPU that will have more L2+L3 core available for cores than AMD since ZEN2 came out).
 

coercitiv

Diamond Member
Jan 24, 2014
7,225
16,982
136
His point is that the process is entirely responsible for that improvement
I'm far more interested in his numbers than his intent. There's a big efficiency gap between Intel 7 and Intel 14nm when operating bellow 4Ghz and that needs to be acknowledged. That doesn't change other key aspects about Gracemont, such as perf/area or idle power usage. It's still a fast little buggger. (intentional typo, automated forum filter transforms it into "ah heck"?!)

Las but not least, doubling the power to increase clocks by 35% doesn't seem that improbable when looking at sub 4Ghz range. That's jumping from 3 to 4Ghz while using 100W instead of 50W. Context is crucial here, remember Intel's graph for the 40% increase in performance does not contain the 4-5Ghz range for Skylake.
 
  • Like
Reactions: Tlh97

jpiniero

Lifer
Oct 1, 2010
16,491
6,983
136
In an unrelated note: It seems Raichu is wrong about Alderlake. Cinebench is actually a pretty decent indicator of uarch performance and to get the 810 R20 score, you need 30% better performance. Golden Cove falls quite short of this. This puts into question the whole "beating 5950X in R20" talk.

As I mentioned the 20% includes AVX-512 regressions which R20 doesn't use. It is entirely possible that in Cinebench the "IPC" gain of the big cores is way above 20%.