Discussion Intel current and future Lakes & Rapids thread

Page 370 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Are there any games which actually use AVX-512? The small cores might have some use.

Not to my knowledge. The only use case I can think of for AVX-512 in gaming would be for physics and particle/destruction effects.

AVX has been and still is currently used for that, and I'm sure with the advent of the PS5 and XsX, AVX2 is going to eventually be utilized heavily by game developers.

I know that Epic developed a new in house physics engine called "Chaos" that has a lot of optimization for AVX/AVX2 from using the Intel ISPC compiler.

Anyway this kind of stuff annoys me, because it slows down the adoption and use of the instruction set for consumers. I can see why so many people have complained about Intel's market segmentation antics with instruction sets over the years.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
Actually not at all, it was C2D core with HT enabled and L2 and L3 changed. You don't have to take my word for it, compare C2D and Nehalem in contemporary sources like RWT:

Frontend:
Inside Nehalem: Intel's Future Processor and System (realworldtech.com)
OoO and ports
Inside Nehalem: Intel's Future Processor and System (realworldtech.com)

You can easily find similar stuff about Sandy Bridge and see what a different from C2D cores should look like.

While many of the Nahalem changes don't look impressive on a block diagram they are significant.

Nahalem/Westmere core changes from Conroe/Penryn:
Macro-ops fusion enhancement, 64bit x86 can now be fused
Improved loop stream detection, can cache 28uops in the LSD
Pipeline increased from 14 to 16 stages
L1 and L2 per core plus shared L3
Branch Prediction Enhancements, new 2nd level branch predictor and renamed return stack buffer
Addition of HT
On die memory controller
Turbo
128 bit SSE 4.1/4.2 instructions
 
  • Like
Reactions: Vattila and Ajay

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
This question is slightly off the current thread line of discussion but I'm curious.

Why do you think Intel went with the larger and different cache structure of Willow Cove from Sunny Cove? It's seems like they added a lot of transistors for very little performance benefit? The increase from Ice Lake to Tiger Lake is due to the increased frequency of the 10SF Tiger Lake process, IPC overall actually decreased from Sunny Cove to Willow Cove. It just seems as though simply moving Sunny Cove to 10SF and perhaps adding two additional cores instead of all that die space used on increasing cache would have been a better strategy? I know I'm missing something?
 

tamz_msc

Diamond Member
Jan 5, 2017
3,770
3,590
136
This question is slightly off the current thread line of discussion but I'm curious.

Why do you think Intel went with the larger and different cache structure of Willow Cove from Sunny Cove? It's seems like they added a lot of transistors for very little performance benefit? The increase from Ice Lake to Tiger Lake is due to the increased frequency of the 10SF Tiger Lake process, IPC overall actually decreased from Sunny Cove to Willow Cove. It just seems as though simply moving Sunny Cove to 10SF and perhaps adding two additional cores instead of all that die space used on increasing cache would have been a better strategy? I know I'm missing something?
I speculate that the reason for this is that Intel wants to increase L3 but making large inclusive L3 is difficult so as to compensate for it they've increased L2.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
I speculate that the reason for this is that Intel wants to increase L3 but making large inclusive L3 is difficult so as to compensate for it they've increased L2.

It just seems to be a strange decision seeing how the performance benefit is basically nothing. I'm sure they modeled the heck out of it before making the decision to go ahead. They are already fighting for die space and power efficiency and they go ahead and add a ton of transistors for basically no payback?

Generally when a really large cache is added it's because new internal structures require more data and if main memory can't keep up then larger cache structures can compensate to "feed the beast." But in this case nothing was changed except the cache structure. Unless they are "testing" things out for ADL, which would require/utilize this new cache arrangement? I'm totally throwing a nutty idea against the wall here.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,853
136
It'd be sort of annoying since it would be a BIOS setting. Are there any games which actually use AVX-512? The small cores might have some use.
I doubt small cores will see use in games and any latency sensitive applications for that matter. The way ADL is built means using small cores incurs a significant latency price, unless application is purposely optimized.

We got a preview of things to come with Lakefield:
1613238803003.png

Compare this with Comet Lake:
1613238883950.png
 

AMDK11

Senior member
Jul 15, 2019
219
145
116
Actually not at all, it was C2D core with HT enabled and L2 and L3 changed. You don't have to take my word for it, compare C2D and Nehalem in contemporary sources like RWT:

Frontend:
Inside Nehalem: Intel's Future Processor and System (realworldtech.com)
OoO and ports
Inside Nehalem: Intel's Future Processor and System (realworldtech.com)

You can easily find similar stuff about Sandy Bridge and see what a different from C2D cores should look like.
You are looking through the prism of an overall outline of a basic schema that does not reflect most changes. Sometimes, in order to add something to the core, you have to rebuild the core and replace old algorithms embedded in the core logic with new, more complex ones, etc. Just because the core looks similar in the overall diagram doesn't mean it's the same.
Believe that Nehalem is no longer a Conroe, but a much improved and rebuilt x86 core, although the number of decoding pipelines and execution units is very similar, however, new predictors and preselectors, among other things, make the x86 core a new microarchitecture.
Nehalem is the first x86 core from Intel whose mechanisms are 100% compatible with x86-64 (micro and macro mergers) compared to Conroe.

Anyway, a colleague previously mentioned the main changes to the Nehalem x86.
 
Last edited:
  • Like
Reactions: Vattila

jpiniero

Lifer
Oct 1, 2010
14,584
5,206
136
I doubt small cores will see use in games and any latency sensitive applications for that matter. The way ADL is built means using small cores incurs a significant latency price, unless application is purposely optimized.

We got a preview of things to come with Lakefield:

You're comparing low power mobile to desktop. The uncore speed is likely very low on Lakefield.
 
  • Like
Reactions: pcp7

mikk

Diamond Member
May 15, 2012
4,133
2,136
136
Likely not, but it was still done haphazardly imo. Ice Lake and Tiger Lake support AVX-512/AMX and Rocket Lake supports AVX-512 as well but now their highest performance core in several years will not? WTF!

I just find it odd that they've invested so heavily in AVX-512 over the years, and now they seem to be backing away from it.

d3e889e1-321f-460b-9267-67b4d8135fcb.png


Yes because Gracemont only supports AVX2 means ADL hybrid can't support AVX512 but it's not a big deal. The only real consumer app I can think of right now is x265 with a real AVX512 benefit and this is mainly on high resolutions, there can be gains of roughly 5% with a power increase at the same time. AVX512 usage is mostly for stress tests or synthetic benchmarks these days: Prime95, RealBench, y-cruncher, SiSoftware, AIDA64, Geekbench.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Yes because Gracemont only supports AVX2 means ADL hybrid can't support AVX512 but it's not a big deal. The only real consumer app I can think of right now is x265 with a real AVX512 benefit and this is mainly on high resolutions, there can be gains of roughly 5% with a power increase at the same time. AVX512 usage is mostly for stress tests or synthetic benchmarks these days: Prime95, RealBench, y-cruncher, SiSoftware, AIDA64, Geekbench.

One of the reasons AMD patented a more intelligent approach. My guess is Intel did this (segmenting AVX-512) because they have no way of guaranteeing an AVX 512 workload could bounce over to a non-capable core. It wouldn’t surprise me if they are working with Microsoft on this. Meanwhile over at the AMD camp, they are cooking up something that works automatically, and has significant bearing on the direction of future processors.
 
Last edited:

bumble81

Junior Member
Feb 14, 2021
7
1
16
The arguing about percentages is always interesting.

Let's say we start with Prescott=1 .00+40% Merom 1.4 + 12% Nehalem= 1.56 + 10% SandyBridge= 1.73 + 4% IvyBridge=1.79 +10% Haswell=1.97 +4% Broadwell=2.05 +10% Skylake = 2.26 +17% Sunny Cove = 2.64 +19% Golden Cove = 3.14

What you can see is that since the baseline always changes the percentage increases are actually much more absolute IPC each time. That means the 19% increase of Golden Cove alone is around 50% of a Prescott, or actually more than the 40% you got from Merom. Or even the measly 10% increase on Skylake is about half a Merom increase over Prescott, and slightly more than a Sandy Bridge in absolute IPC.

If percentages without the baseline matters you would be far better off with Atom, which consistently has 20-40% for a Tock.
 

SAAA

Senior member
May 14, 2014
541
126
116
The arguing about percentages is always interesting.

Let's say we start with Prescott=1 .00+40% Merom 1.4 + 12% Nehalem= 1.56 + 10% SandyBridge= 1.73 + 4% IvyBridge=1.79 +10% Haswell=1.97 +4% Broadwell=2.05 +10% Skylake = 2.26 +17% Sunny Cove = 2.64 +19% Golden Cove = 3.14

What you can see is that since the baseline always changes the percentage increases are actually much more absolute IPC each time. That means the 19% increase of Golden Cove alone is around 50% of a Prescott, or actually more than the 40% you got from Merom. Or even the measly 10% increase on Skylake is about half a Merom increase over Prescott, and slightly more than a Sandy Bridge in absolute IPC.

If percentages without the baseline matters you would be far better off with Atom, which consistently has 20-40% for a Tock.
You missed a little 60% with Merom, that's closer to 2.0 against Prescott 1.0, with 40% being the absolute performance increase (due to lower clocks).

Still your point of absolute IPC numbers increasing over percentage is correct, well, of course +10% IPC on a 100 baseline is a larger increase than a +20% over 40 base.
It does make Sunny and Golden cove shine a bit more in that regard, even with the time it's taking to bring them to market, the gains are substantial.

This point was also a good x86 argument as ARM was getting larger % increase but from a lower starting point. Now that is gone so we are back to frequency advantage alone for desktops.
 

bumble81

Junior Member
Feb 14, 2021
7
1
16
You missed a little 60% with Merom, that's closer to 2.0 against Prescott 1.0, with 40% being the absolute performance increase (due to lower clocks).

Good point. So Merom is still better than Golden Cove.

This point was also a good x86 argument as ARM was getting larger % increase but from a lower starting point. Now that is gone so we are back to frequency advantage alone for desktops.

That's only true for Apple, the other ARMs are still behind the big cores. The highend ones are comparable to Atom.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
The arguing about percentages is always interesting.

Let's say we start with Prescott=1 .00+40% Merom 1.4 + 12% Nehalem= 1.56 + 10% SandyBridge= 1.73 + 4% IvyBridge=1.79 +10% Haswell=1.97 +4% Broadwell=2.05 +10% Skylake = 2.26 +17% Sunny Cove = 2.64 +19% Golden Cove = 3.14

What you can see is that since the baseline always changes the percentage increases are actually much more absolute IPC each time. That means the 19% increase of Golden Cove alone is around 50% of a Prescott, or actually more than the 40% you got from Merom. Or even the measly 10% increase on Skylake is about half a Merom increase over Prescott, and slightly more than a Sandy Bridge in absolute IPC.

If percentages without the baseline matters you would be far better off with Atom, which consistently has 20-40% for a Tock.

As absolute performance numbers grow larger of course the difference in those numbers has to increase to keep percentages even. This is the definition of percentages.

If A does 100 widgets/hour and B does 150 widgets/hour, then B is faster by 50 widgets/hour or 50% faster.

If C does 225 widgets/hour then C is 50% faster than B and has to be faster by 75 widgets/hour to achieve the same percent performance increase as B.

It's not really "harder" for C to achieve the same performance increase as B, it's the nature of number theory.

Now if you are saying it's harder to keep increasing efficiency due to the low hanging fruit argument then that almost always true. Increasing performance through iterative design always is harder as the iterations progress.

Software/hardware usually starts topping out at certain versions before massive infrastructure changes need to occur.

MS Word, Excel. Back in the late '80's, early '90's new versions were notable events with in depth reviews. But for years now they just "work."
CorelDraw has massive improvements, features for about the first 8 revisions, then just small incremental things.
Video editing software, let's say what was originally Sound Forge, then Sony, now Magix, got pretty solid and complete by version 10 or so.

How about smart phones? Once they were fast enough to smooth scroll complex web pages and open the camera basically instantly I was done with performance upgrades. Hence I'm still using a Pixel 2 with no issues/complaints. My daughter "needs" the latest iPhone for the me emogies. I'm like $1000 for that one feature and another useless camera? Yeah, not so much. They're (manufacturers) really looking for features we need to sell new phones. That's why Apple got caught slowing down old phones with updates, they realize the yearly phone upgrade cadence is pretty pointless now.

Even with computer processors we've reached a "good enough" point for most people. Luckily the competition between ARM/AMD/Intel is creating superfast designs that honestly probably 1% of the population (and of course industry) really need for productivity purposes.

Okay so I'm really curious as to how Golden Cove is going to pull another 19% IPC "rabbit out of the hat" IPC increase over Sunny Cove? Looking at the big front end/back end structures right now we are at 5 decoders (1 complex, 4 simple) and 10 execution ports.

This is where some of the Big Brains here are going to have to come in and slap me straight but I'm thinking for Golden Cove Intel is going to have to do something with the decoders. 2 more simple decoders? Another complex? Since the back end was just opened up I'm thinking they'll look to the front end.

And of course they will have to make a lot of the algorithms that keeps instructions moving through the core efficiently they'll need to make them smarter and probably a bit larger.

DDR5 should help with feeding the beast so right off the bat so Golden Cove probably has a small IPC increase coming just from the memory subsystem increase. After that Intel knows what the next bottle neck is and how to open it.

Okay, cue the Big Brains here to step in and give us some thoughts on what structures in Golden Cove probably need to be enhanced to provide another 19% IPC over Sunny Cove?
 
  • Like
Reactions: Vattila

bumble81

Junior Member
Feb 14, 2021
7
1
16
Now if you are saying it's harder to keep increasing efficiency due to the low hanging fruit argument then that almost always true. Increasing performance through iterative design always is harder as the iterations progress.
Not saying anything about low hanging fruit . Point was it's pointless to use incremental percentages relative to the previous part to compare something as far apart as Merom and Golden Cove which have completely different baselines. It's not just pointless, it's actively misleading.

Even with computer processors we've reached a "good enough" point for most people. Luckily the competition between ARM/AMD/Intel is creating superfast designs that honestly probably 1% of the population (and of course industry) really need for productivity purposes.

There's always Wirth's law. Software is getting slower faster than hardware is getting faster.

Okay so I'm really curious as to how Golden Cove is going to pull another 19% IPC "rabbit out of the hat" IPC increase over Sunny Cove?
Presumably in the same ways as Apple (and AMD and Intel in the past and lots of others) did it. Lots of incremental improvements that add up. Lots of hard work by smart people. You could probably look at the previous generations and extrapolate patterns from that.
 
  • Like
Reactions: pcp7

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
Not saying anything about low hanging fruit . Point was it's pointless to use incremental percentages relative to the previous part to compare something as far apart as Merom and Golden Cove which have completely different baselines.


Sure, but the baseline always used to compare IPC increase is the processor architecture that came before the one you are comparing. When comparing Sandy Bridge to Ivy we don't need to consider Prescott or Golden Cove, we only need to compare the two architectures we are looking at. To do otherwise is to deny the very definition of a comparison using percentages to me at least.

If you want to say it's harder to maintain the same IPC increase from generation-to-generation I will agree. But where we will have to agree to disagree is the cause. You say it's harder due to the baseline, which honestly I don't understand, and I'm not being snarky. I say it's harder because easier/quicker improvements were available (more low hanging fruit) when generations were closer to the beginning of the evolution (baseline)

Also I think P4 to Conroe was more like 82.7% IPC increase.

The following data is from Anand's Conroe test results. I have not included Winstone and gaming benchmarks because CPU performance is heavily mitigated by hard drive and GPU.

CPUCPU
2.933
Sysmark 2004X6800Pentium D 930NormalizedIPC
Overall
371​
211​
206​
80.0%​
Internet Content Creation
482​
256​
250​
92.8%​
Office Productivity
285​
174​
170​
67.7%​
3D Content Creation
447​
232​
227​
97.3%​
2D Content Creation
568​
302​
295​
92.6%​
Web Publication
442​
240​
234​
88.6%​
Communication And Networking
202​
142​
139​
45.7%​
Document Creation
380​
206​
201​
88.9%​
Data Analysis
302​
180​
176​
71.8%​
Average
80.6%​
PC WorldBench 5
Overall WorldBench Score
156​
99​
97​
61.3%​
3D Rendering
2dsmax7
4.11​
2.13​
2​
97.6%​
Cinebench 1CPU
486​
256​
250​
94.4%​
Cinebench XCPU
892​
460​
449​
98.5%​
Average
96.8%​
Video Encoding
Xmpeg 5.03 with DivX 6.1
19.4​
12.2​
12​
62.8%​
Windows Media Encoder WMV9
61.6​
32.4​
32​
94.7%​
QuickTime v7.1H.264 (sec to encode)
120​
223.2​
229​
86.0%​
Average
81.2%​
Audio Encoding
iTunes 6 MP3
26​
48​
49​
84.6%​
Average of all individual scores
82.7%​
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
There is one factor that could be causing confusion or miscommunication between us.

Of your IPC percentages from processor-to-processor were calculated from a set of scores from which each generation of architectures was represented then percentage differences farther from the baseline would be represented as smaller than those nearer the baseline.

But generally it is hard to find a benchmark scores from P4 through Sunny Cove? Do you have those and then normalize clocks, and then calculate IPC for each generation from the baseline? If you did then you are correct. But if the numbers you presented were calculated in solely from the previous generation then I'm pretty sure my initial analysis is correct.

I'm pretty sure your Golden Cove 19% was calculated/estimated based on Sunny Cove, and not Prescott right? If so they X% from Prescott to Merom and X% from Sunny Cove to Golden Cove (both 1 generation jumps) would "mean" the same thing.
 

bumble81

Junior Member
Feb 14, 2021
7
1
16
There is one factor that could be causing confusion or miscommunication between us.

Of your IPC percentages from processor-to-processor were calculated from a set of scores from which each generation of architectures was represented then percentage differences farther from the baseline would be represented as smaller than those nearer the baseline.

I used the numbers published by Intel (but as SAAA pointed out I got the Merom number wrong)

It's in many places, but here is one upto Skylake.


I'm pretty sure your Golden Cove 19% was calculated/estimated based on Sunny Cove, and not Prescott right? If so they X% from Prescott to Merom and X% from Sunny Cove to Golden Cove (both 1 generation jumps) would "mean" the same thing.

Yes it was relative to Sunny Cove. But it means something completely different than the Prescott->Merom number because the baseline is completely different. You just can't compare them.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
Let's not get into this okay. All I'm saying is the baseline doesn't have any bearing unless the scores that you are comparing are directly referencing the baseline. Is your Sunny cove two Golden cove percentage referencing the baseline in any way? Or could it be computed just with numbers from Sunny cove and Golden cove? If it could be computed just from Golden cove and Sunny cove benchmarks then it has nothing to do with the baseline.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
A baseline is only meaningful when all scores are computed with reference to that baseline.
 

bumble81

Junior Member
Feb 14, 2021
7
1
16
Let's not get into this okay. All I'm saying is the baseline doesn't have any bearing unless the scores that you are comparing are directly referencing the baseline. Is your Sunny cove two Golden cove percentage referencing the baseline in any way? Or could it be computed just with numbers from Sunny cove and Golden cove? If it could be computed just from Golden cove and Sunny cove benchmarks then it has nothing to do with the baseline.

I believe the Intel numbers are based on a large number of workloads which are together supposed to be representative of overall workloads. So even though the workload mix is likely not identical and changes over time we assume that if there are enough samples there is a reversal to the mean and they are comparable.

If they are not comparable then the percentage discussions is even more pointless.