Discussion Intel current and future Lakes & Rapids thread

Page 501 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
The flaw is that graph is that it fails to specify the perf at iso-power comparison, or the power draw at iso-perf comparison.

The numbers they have both spoken and on their source website are very clear, but you evidently didn't even look at them.

If we compare our Efficient-core to a single Skylake core for a single logical process, we deliver 40% more performance at the same power.

If we compare our Efficient-core to a single Skylake core for a single logical process, we deliver the same performance while consuming less than 40% of the power.

Alternatively, a Skylake core would consume 2.5X more power to achieve the same performance as our Efficient-core.

If we compare four of our new Efficient-cores against two Skylake cores running four threads, we deliver 80% more performance while still consuming less power.

Alternatively, we deliver the same throughput while consuming 80% less power. This means that Skylake would need to consume 5 times the power for the same performance.

So again, these numbers that you claimed didn't even exist answer your supposed criticism.

If the comparison is done at an operating point which grossly favors one design, then it is extremely misleading, and you cannot extrapolate that comparison to other operating points.

Good thing they show a range of operating points in that graph. And elaborated more in text/verbally.
 
  • Like
Reactions: mikk

coercitiv

Diamond Member
Jan 24, 2014
6,201
11,903
136
Your original argument

Me: Shows graph
Of course my reply was one of disbelief, since you're not making any sense.

According to @IntelUser2000 estimates:
  • Gracemont is a ~1.5mm2 core
  • Golden Cove is a ~ 8mm2 core
See the problem yet? We're looking at cores developed by the same company, included in the same design, with clear performance targets from the start of the design process. You're arguing Intel knew their 1.5mm2 core can perform the same as their 8mm2 core up to 4Ghz (and maybe even beyond that with a few tweaks), but they went ahead with the "stupid" big cores anyway.

The fact that other companies have small cores with excellent IPC does not change the other major fact that Intel's choices are limited to their IP and they chose to allocate 80% of core area to Cove instead of Atom.
 
  • Like
Reactions: Tlh97

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I agree that it won't be 30% faster, and the comparisons are more iso-clock than it looks.

That AT graph is showing me that X1 is 20% not 30% faster, because that's with Skylake clocked at near double the frequency.

@coercitiv I think we agree with each other but just arguing over the details. My bet is it'll still have a big advantage even with iso-process. We're not going to be able to do those comparisons until well after the release. Most here are pretty much complaining that they didn't give enough details so we can go scrutinize to the last 1% - of course just for competitive reasons they don't need to until release.
 
Last edited:
  • Like
Reactions: Tlh97 and coercitiv

dullard

Elite Member
May 21, 2001
25,066
3,415
126
Of course my reply was one of disbelief, since you're not making any sense.

According to @IntelUser2000 estimates:
  • Gracemont is a ~1.5mm2 core
  • Golden Cove is a ~ 8mm2 core
See the problem yet? We're looking at cores developed by the same company, included in the same design, with clear performance targets from the start of the design process. You're arguing Intel knew their 1.5mm2 core can perform the same as their 8mm2 core up to 4Ghz (and maybe even beyond that with a few tweaks), but they went ahead with the "stupid" big cores anyway.

The fact that other companies have small cores with excellent IPC does not change the other major fact that Intel's choices are limited to their IP and they chose to allocate 80% of core area to Cove instead of Atom.
You still aren't comprehending the purpose of big and small. Gracemont is not quite as good as Golden Cove, we can both agree on that. But they are close in performance at a given clock rate (note this is for a single thread, Gracemont does not have hyperthreading). The purpose of Golden Cove, and the purpose for dedicating that much area to it is to go high frequency regardless of power consumption. That gives a fast burst of performance for a fast feel. At first (especially with Alder Lake) Gracemont will take a smaller portion of the load. But as Intel continues with the hybrid route, the "little" cores will do almost all of the grunt work and multi-threaded work, leaving just a small handful of Cove cores to do the user interface and single thread stuff. That isn't "stupid", that is attempting to give a better user experience with a CPU that always feels instantly responsive.

If Intel's drawings on slide #74 (Architectural Day 2021) are to scale, then the "big" cores are:
  • ~27% of the area for desktop
  • ~20% of the area for mobile
  • ~10% of the area for ultramobile
Yes, that does use some die space, but it is still less than a third of the die to have fast single threaded performance.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
225
152
116
One thing puzzles me and bothers me. According to Intel slides, GoldenCove has a 6-way x86 decoder while SunnyCove according to the same slide has a 4-way x86 decoder. Really strange because I thought that Skylake and SunnyCove have a 5-way decoder according to, among others, wikichip.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
One thing puzzles me and bothers me. According to Intel slides, GoldenCove has a 6-way x86 decoder while SunnyCove according to the same slide has a 4-way x86 decoder. Really strange because I thought that Skylake and SunnyCove have a 5-way decoder according to, among others, wikichip.

That is a good question i also had. When i've looked up things - Agner Fog pretty much found that maximum number of instructions decoded is 4. And they can be decoded into maximum of 5 uOps. There are nasty limitations of what is single uOP instruction and what is not, there is 16 byte decode window and variuos other rules.

Now it remains to be seen how things are with Golden Cove but it seem they have 32 byte predecode window and 6 decoders. What is unknown is what rules of operation these decoders have - are there still complex + simple scheme or if that was dropped altogether.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Of course my reply was one of disbelief, since you're not making any sense.

According to @IntelUser2000 estimates:
  • Gracemont is a ~1.5mm2 core
  • Golden Cove is a ~ 8mm2 core
See the problem yet? We're looking at cores developed by the same company, included in the same design, with clear performance targets from the start of the design process. You're arguing Intel knew their 1.5mm2 core can perform the same as their 8mm2 core up to 4Ghz (and maybe even beyond that with a few tweaks), but they went ahead with the "stupid" big cores anyway.

The fact that other companies have small cores with excellent IPC does not change the other major fact that Intel's choices are limited to their IP and they chose to allocate 80% of core area to Cove instead of Atom.

Atom is a completely different architecture from core. It shares no relation with core whatsoever.

Different design teams.

Performance will range from somewhat below Skylake to somewhat above depending on the workload.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
One thing puzzles me and bothers me. According to Intel slides, GoldenCove has a 6-way x86 decoder while SunnyCove according to the same slide has a 4-way x86 decoder. Really strange because I thought that Skylake and SunnyCove have a 5-way decoder according to, among others, wikichip.
I know this is the case for SNC but not sure about Skylake but Sunny cove is considered to have 4 complex and 1 simple decoder iirc. (4+1). You can say 5 but without the proper explanation to go alongside it makes that pretty misleading.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I know this is the case for SNC but not sure about Skylake but Sunny cove is considered to have 4 complex and 1 simple decoder iirc. (4+1).

It's the other way around: 1 complex that can decode any instruction and 4 simple that can decode instructions that decode into single uOP. But total amount of uOPs per cycle from decoders is 5, used to be 4 before. And total amount of instructions decoded seems to be 4 as well in Agner Fog testing.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
225
152
116
According to the slide it is clear that in GodenCove the x86 decoding paths have been increased from 4 to 6, which would in a way confirm that Skylake and SunnyCove have a 4-way decoder. There is only information on the slide: Wider 4-> 6 decoders. I don't see any hidden context or micro-ops in it. It is clearly that the wider decoder goes from 4-way to 6-way. In the 2015 Intel Skylake slide, in the graphical diagram, there is a 4-way x86 decoder with a handle with the number 5 information on it, which can mean 4 x86 instructions / microinstructions + 1 instruction / microinstruction thanks to micro or macro fusion.

Wikichip:
"Front-End Improvements
There are some really major changes in the Golden Cove microarchitecture. The front-end on Golden Cove is perhaps the biggest genealogical change in the microarchitecture going as far back as Sandy Bridge or even earlier. Although the cache itself is unchanged, Golden Cove can now fetch 32 bytes each cycle, doubling the fetch bandwidth versus all prior cores (I believe all the way back to the original P6). The higher bandwidth was necessitated by the largest change in the code – the decoders. Golden Cove is now 50% wider than all previous cores, adding 2 additional decoders for a total of six.
To complement that higher fetch and decode bandwidth from the MITE path, the DSB path was also enlarged. Intel nearly doubled the micro-op cache to 4K and increased the delivery bandwidth by a third – from 6 μOPs/cycle to 8 μOPs/cycle."
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,225
2,015
136
Seems like the big question here is the "golden" optimal ratio of Golden Cove to Gracemont cores? Which will vary from user-to-user by workload(s). If Gracemont really is nearly Skylake at 1/4 the die space I think I'd rather have 4 Coves and 24 Gracemont cores. But of course we won't know until we get our sweaty hands on these things.
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Of course my reply was one of disbelief, since you're not making any sense.

According to @IntelUser2000 estimates:
  • Gracemont is a ~1.5mm2 core
  • Golden Cove is a ~ 8mm2 core
See the problem yet? We're looking at cores developed by the same company, included in the same design, with clear performance targets from the start of the design process. You're arguing Intel knew their 1.5mm2 core can perform the same as their 8mm2 core up to 4Ghz (and maybe even beyond that with a few tweaks), but they went ahead with the "stupid" big cores anyway.
1. Golden Cove is designed to be able to clock higher and comes with more features such as SMT + AVX512, features which Gracemont would not have
2. @IntelUser2000 estimates on die sizes was based on Gracemont being an incremental step-change from Tremont, when its clear that isn't the case here (more than doubling of the back-end resources + 2x L1 size)
3. Atom and Cove were designed by seperate design teams and history is littered with cases where one team basically designed a wholly superior architecture than another team within the same company. Of course in this case due to the feature set Golden Cove has its being kept on for now, but if future rumors/roadmaps are to be believed Gracemont successors will be relied upon more heavily in future Intel products.
 

RanFodar

Junior Member
May 27, 2021
19
17
51
Forget AMD vs Intel forum wars.. it's now Gracemont vs Golden Cove

That is because both Cove and Atom design teams have different philosophies in designing a core, arch goals, and the way which is presented into the public. Combine both of these together, then the forumers will decide whether to side to one core or the other.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Here you go, numbers. Just one example.

E-Core%20perf.jpg


And if you go to the source. E.g.





And why are you assuming I'm not? Probably doing better than an "engineer" who thinks process scaling is a myth, lol.
You're just playing into his hands. These graphs are completely arbitrary, and it's like AMD's 25x25 initiative, which they have achieved - technically. Reality begged to differ at the time though :)
 

jpiniero

Lifer
Oct 1, 2010
14,599
5,218
136
re: R20, Golden Cove did increase the vector units to 3 from 2. Between that and the DDR5 bandwidth increase you'd think R20 would be way faster. Don't ask about power, especially at 5 Ghz.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
re: R20, Golden Cove did increase the vector units to 3 from 2. Between that and the DDR5 bandwidth increase you'd think R20 would be way faster. Don't ask about power, especially at 5 Ghz.
Still, on 10esf, I mean Intel 7 (poor esf... did sharks find the name superfin offensive?) it _has to_ be better than RKL.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Why compare against a 6 years old architecture and not against GEN 11 ???

I mean that throughput graph against Skylake 2C4T is................
For the same reason when you wanna impress some dude's hot girlfriend you wouldn't say 'I'm a better driver than Lando Norris', but you'd say 'I'm a better driver than your grandpa laying in bed after his stroke' instead.
 

mikk

Diamond Member
May 15, 2012
4,140
2,154
136
There s no perf data here, only a perf/watt comparison blurred by the fact that there s a new process at work.

As already said a shrinked SKL would reproduce the first graph, it would gain close to 40% better perf at isopower.

For the second graph a shrinked SKL would use 50% less power, stick two cores together and they ll exhibit 100% more perf at isopower..

Also, at the risk of repeating myself you should take notice of the bench wich is Spec_rate and not Spec_int.


Intel stated that Gracemont has a better IPC than Skylake microarchitecture, this was one of their design goals. This is the most interesting part in their presentation (performance related). It confirms earlier IPC rumors. And I don't think your shrink numbers are in-line with the real world. From Intel node shrinks in the past 50% less power is a wishful thinking without bigger architecture improvements, it's not a given Skylake would use 50% less power. 2600k 32nm to 3770k 22nm wasn't even close to your numbers.
 

cortexa99

Senior member
Jul 2, 2018
319
505
136
Anyone knows or was there any official documents mentioning HEDT Alderlake? If HEDT exist I wonder whether AVX512 would be implemented or not? If no HEDT the AVX512 is about to absolutely dead for consumer which is a shame. It's an instruction which needs to be proved but who knows it end up like this. I guess those programmers which already worked on AVX512 optimization would feel embarrassed right now.
I just remember TSX was suddenly dead too.
 

jpiniero

Lifer
Oct 1, 2010
14,599
5,218
136
Anyone knows or was there any official documents mentioning HEDT Alderlake? If HEDT exist I wonder whether AVX512 would be implemented or not? If no HEDT the AVX512 is about to absolutely dead for consumer which is a shame. It's an instruction which needs to be proved but who knows it end up like this. I guess those programmers which already worked on AVX512 optimization would feel embarrassed right now.
I just remember TSX was suddenly dead too.

There is a two tile version of Sapphire Rapids coming for HEDT. They just didn't cover it because it's not coming any time soon.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Intel stated that Gracemont has a better IPC than Skylake microarchitecture, this was one of their design goals. This is the most interesting part in their presentation (performance related). It confirms earlier IPC rumors. And I don't think your shrink numbers are in-line with the real world. From Intel node shrinks in the past 50% less power is a wishful thinking without bigger architecture improvements, it's not a given Skylake would use 50% less power. 2600k 32nm to 3770k 22nm wasn't even close to your numbers.

Well it looks very similar to me......I mean the marketing.


22nm-vs-32nm.png


e-Core-efficiency.png