Discussion Intel current and future Lakes & Rapids thread

Page 749 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Saylick

Diamond Member
Sep 10, 2012
3,162
6,385
136
Is it even remotely possible for us to agree on IPC increases from past generations? And for those of you who don't like the word "IPC" we mean rate of work done at equal frequencies, or throughput at equal frequencies, etc...

A couple of years ago I compiled some benchmarks, normalized clocks, and calculated some percentages. Yes, I know normalizing clocks isn't great as memory doesn't scale, and yes I know there are a million benches out there, I tried to mainly stay with Anandtech. This is what I got. Support attached. You can see Intel pretty much works on one end then the other...
Intel Generational Work Rate Comparison ResultsGeomeanAverage
P4 to Conroe82.7%
83.4%​
Complete redesign
Conroe to Nahalem
20.2%​
22.4%​
Memory increases, add L3
Nahalem to Sandy Bridge
11.8%​
12.2%​
Smarter OoO larger registers
Sandy Bridge to Ivy Bridge
6.7%​
6.9%​
Prefetcher improvements
Ivy Bridge to Haswell
8.7%​
8.9%​
add 2 execution ports
Haswell to Skylake
8.9%​
9.5%​
add simple decoder
Skylake to Sunny Cove
21.0%​
21.3%​
add 2 execution ports
Oooo. Me likey. Do you have plan on adding Golden Cove to the mix?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,559
14,513
136
AMD is introducing Zen 4c to fight computing densities. Either AMD's using simliar designs or not is not really important for customers. They're aiming for the same market.
Even today, hypothetical Saphhire rapids-sized chip with Gracemont will pack up 180 cores (Assuming 3 GM cores per 1 GC core for conservative approaches). Zen 4c based chips will need to fight against a battalion of next mont CPUs, it'll be tough opponent.
I don't think anyone here gets it. Intel E-cores are leas than 1/2 the productivity of the P-cores. Bergamo is only ~10% less(per core) , while only being 70% of the die space.

But twist the facts all you want, this is an Intel thread, so thats expected.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Also anyone know what Ocean Cove was supposed to be? Just randomly remembered it... cancelled architecture for MTL. Was it supposed to be a new arch vs GLC refresh that MTL is now? Cancelled a while back wasn't it? 2 or 3 years ago IIRC
Canceled more like 6 years back, iirc. And it was to be far more radical than just a new uarch. Probably the biggest change in CPUs since the proliferation of OoO computing.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Intel E-cores are leas than 1/2 the productivity of the P-cores. Bergamo is only ~10% less(per core) , while only being 70% of the die space.
So now you're just making up numbers. I guess that's what you usually wind up doing to defend the absurd. And you wonder why I use the word "denial"...

And I can't believe it needs to be said, but not every use case cares about peak single thread performance, especially in the cloud market. For that matter, have you ever considered how the per thread performance of the 9654 compared to Gracemont? You might be surprised.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Is it even remotely possible for us to agree on IPC increases from past generations?
I think gen to gen, it's pretty straightforward to get a reasonably representative number with a few percent error bars, but over longer periods of time, things get tricky. Like all of these new vector and matrix extensions that are primarily targeted to AI workloads that didn't exist when e.g. Sandy Bridge was a thing. How do we weight those? And various companies have commented on this at times, but there are radically different workloads between different market segments (cloud, client PC, HPC, etc.) We can probably come up with some rough approximate, but more qualitative than quantitative.

Good work though, and I hope you keep on updating the chart. Just kinda musing on the question.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,559
14,513
136
So now you're just making up numbers. I guess that's what you usually wind up doing to defend the absurd. And you wonder why I use the word "denial"...

And I can't believe it needs to be said, but not every use case cares about peak single thread performance, especially in the cloud market. For that matter, have you ever considered how the per thread performance of the 9654 compared to Gracemont? You might be surprised.
why don't you show me a benchmark, instead of just running your mouth. And I am not making up numbers. The core is the same it just runs a little slower. And 128 cores fit where 96 used to (same socket). The 10% is a little bit of a guess, but based on facts. You just insult me with no facts.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
why don't you show me a benchmark, instead of just running your mouth.
So let's start. Where are you getting that Zen 4c is 90% the performance of Zen 4? And why are you saying it's 70% of the area, when AMD's said half?

And you need benchmarks to show that the server market doesn't only care about single thread performance? Are you joking?

As for the 9654's per-thread performance, let's do some back of the envelope math. Let's just estimate Zen 4 has 40% more IPC than Skylake/Gracemont. Should be close enough. The 9654 has a base clock of 2.4GHz. So its per-thread performance with SMT enabled is equivalent to roughly Gracemont at 2.4GHz * 1.4 / 2 => 1.7GHz. In other words, lower than most Alder Lake N SKUs. Bergamo will obviously be even lower with SMT enabled. Yet you seem to have no issue claiming that's a desirable product, so why the contradiction?
 

DrMrLordX

Lifer
Apr 27, 2000
21,632
10,845
136
They're making an entire new server line with them.

They've already got a server line with Atom cores. They may be expanding it, but for now the only actual products are still tiny CPUs/SoCs for use in comm gear and similar.

It's absurd to deny their usefulness.

Usefulness compared to Golden Cove? Maybe. Usefulness vs. a better, more area-efficient "big core" on a process with a less-tortured history than 10nm? No. They only look good because their big brothers have so many problems.

The only real argument you can make is that it's taken Intel far too long to make such an obvious move.

No, the argument you can make is that Intel's designs are in shambles, but somehow the Atom team is doing okay.

The ARM folk realized this ages ago.

Take a look at Graviton 3 or the Altra lineup. See any "little" cores in there? No? Nobody wants to use A55 or A510 or equivalent in server designs. Not anymore.
 

Geddagod

Golden Member
Dec 28, 2021
1,157
1,019
106
Canceled more like 6 years back, iirc. And it was to be far more radical than just a new uarch. Probably the biggest change in CPUs since the proliferation of OoO computing.
It's a shame that's not showing up in MTL. Royal Cove doesn't look like it shows up until after PTL I think. That's a huge gap between when it was supposed to show up and when it will after cancellation (obv prob not same architecture but the whole 'huge change in CPU architecture' thing).
 

diediealldie

Member
May 9, 2020
77
68
61
Usefulness compared to Golden Cove? Maybe. Usefulness vs. a better, more area-efficient "big core" on a process with a less-tortured history than 10nm? No. They only look good because their big brothers have so many problems.

Oh..not really. It's true that Intel Cove team's pretty much screwed up, but you can't make hypothetical 1+0 BF Core who wins both latency and throughput against hoards of small cores. If GPU works, then why not E cores? Even if their P core team finds their way, E core will still be there.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
They've already got a server line with Atom cores. They may be expanding it, but for now the only actual products are still tiny CPUs/SoCs for use in comm gear and similar.
I wouldn't refer to some networking ASICs as server CPUs. And regardless of the semantics, Sierra Forest is what will bring Atom to their server lineup as a first class citizen.
Usefulness vs. a better, more area-efficient "big core" on a process with a less-tortured history than 10nm? No.
Gracemont has approximately twice the PPA vs Golden Cove. Explain what magic you think Intel could could pull out to double the PPA of the big core that wouldn't affect Atom? The gap is far too large to ever conceivably close. Again, all things the ARM folk realized ages ago, and AMD's working on today in their own way.
Take a look at Graviton 3 or the Altra lineup. See any "little" cores in there? No? Nobody wants to use A55 or A510 or equivalent in server designs. Not anymore.
Atom is comparable to ARM's N / A7XX series, not E / A5XX series. Which have gotten plenty of adoption.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
It's a shame that's not showing up in MTL. Royal Cove doesn't look like it shows up until after PTL I think. That's a huge gap between when it was supposed to show up and when it will after cancellation (obv prob not same architecture but the whole 'huge change in CPU architecture' thing).
Yeah, it's quite a shame. One of the many casualties of BK. Royal is monumental in its own right, but in a very different way than Ocean Cove was intended to be. Ah, that will be a crazy day when that launches.
 

naukkis

Senior member
Jun 5, 2002
706
578
136
As for the 9654's per-thread performance, let's do some back of the envelope math. Let's just estimate Zen 4 has 40% more IPC than Skylake/Gracemont. Should be close enough. The 9654 has a base clock of 2.4GHz. So its per-thread performance with SMT enabled is equivalent to roughly Gracemont at 2.4GHz * 1.4 / 2 => 1.7GHz. In other words, lower than most Alder Lake N SKUs. Bergamo will obviously be even lower with SMT enabled. Yet you seem to have no issue claiming that's a desirable product, so why the contradiction?

Zen4c is the same core as regular Zen, only made with libraries targeting density over clock speed. So Zen4c core itself will be just as powerful as regular Zen, cache differences can make a little difference for overall perfomance. But as made with high-density Zen4c will consume less power than regular Zen4 with moderate frequencies so it can maintain higher all-core frequencies with high core-count MT loads - or have same clocks at same power with more cores. Intel will have tough time with it small cores to be competitive - they sure need cutting edge low power manufacturing (tmsc) for them.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Zen4c is the same core as regular Zen, only made with libraries targeting density over clock speed
They haven't confirmed that detail. It's only speculation so far.
But as made with high-density Zen4c will consume less power than regular Zen4 with moderate frequencies so it can maintain higher all-core frequencies with high core-count MT loads
The math there will be complicated, and I don't think we really understand the design enough to judge the frequency vs power tradeoffs. But if Zen 4c could really clock faster than normal Zen 4 at the same power and half the area, there'd be very little reason to sell normal Zen 4 at all, so I'm inclined to be skeptical of that detail.

But that's neither here nor there. Regardless of how AMD arrived at the end state, the value prop for their dense/cloud lineup is the same one Intel's making for their Forest one.
 

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146
They haven't confirmed that detail. It's only speculation so far.

The math there will be complicated, and I don't think we really understand the design enough to judge the frequency vs power tradeoffs. But if Zen 4c could really clock faster than normal Zen 4 at the same power and half the area, there'd be very little reason to sell normal Zen 4 at all, so I'm inclined to be skeptical of that detail.
I think the key part of that sentence there is the "moderate" frequencies bit. Not sure of the bit about maintaining higher all core frequencies in actual practice but the idea that at specific power levels Zen 4C could be capable of hitting higher frequencies doesn't seem outlandish to me.

Also, regular Zen 4 still has 2x the L3 per CCX, so Zen4C isn't without it's tradeoffs even outside of that.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I think the key part of that sentence there is the "moderate" frequencies bit. Not sure of the bit about maintaining higher all core frequencies in actual practice but the idea that at specific power levels Zen 4C could be capable of hitting higher frequencies doesn't seem outlandish to me.

Also, regular Zen 4 still has 2x the L3 per CCX, so Zen4C isn't without it's tradeoffs even outside of that.
Iso-voltage, I'd expect Zen 4c to have a moderate frequency penalty. So the capacitance/leakage reduction from the smaller design would need to dominate. I agree that the idea isn't outlandish, but I'm not quite convinced the numbers work out in this particular instance. But hey, if they can pull off >2x PPA from a relatively simple derivative, that would be awesome.
 

naukkis

Senior member
Jun 5, 2002
706
578
136
The math there will be complicated, and I don't think we really understand the design enough to judge the frequency vs power tradeoffs. But if Zen 4c could really clock faster than normal Zen 4 at the same power and half the area, there'd be very little reason to sell normal Zen 4 at all, so I'm inclined to be skeptical of that detail.

Core need big transistors to being able to feed currencies needed to be able to drive high frequencies. Regular Zen4 can hit 5+GHz at high voltages with high power use. Zen4c doesn't but with it small density-optimized design it will consume less power at lower frequencies. We don't know trade-off frequency yet - but guessing it being about 3GHz. So Zen4c probably offers better power/performance ratio below 3GHz and Regular Zen4 above and regular Zen4 can also sustain much higher boost clocks for low-thread workloads. 128c cpu's are all about high MT-load efficiency and 4c is directly targeted to that use scenario. Intel will target that scenario with their small cores.
 
  • Like
Reactions: Joe NYC

naukkis

Senior member
Jun 5, 2002
706
578
136
Iso-voltage, I'd expect Zen 4c to have a moderate frequency penalty. So the capacitance/leakage reduction from the smaller design would need to dominate. I agree that the idea isn't outlandish, but I'm not quite convinced the numbers work out in this particular instance. But hey, if they can pull off >2x PPA from a relatively simple derivative, that would be awesome.

We already know from leaks that 128c Bergamo have at least same base clocks that 96c Genoa at same power. AMD could easily have done 192c Bergamo too but seems that power-scaling didn't allow that.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Zen4c doesn't but with it small density-optimized design it will consume less power at lower frequencies
I think we should move this discussion to a different thread if you want to continue it, but iirc, the power tradeoff between HP and HD transistors isn't quite that clear cut. And anyway, we've no confirmation that they're merely (or even at all) switching library types between the two designs. 2x density probably demands design changes.
 

naukkis

Senior member
Jun 5, 2002
706
578
136
I think we should move this discussion to a different thread if you want to continue it, but iirc, the power tradeoff between HP and HD transistors isn't quite that clear cut. And anyway, we've no confirmation that they're merely (or even at all) switching library types between the two designs. 2x density probably demands design changes.

It's about density. With half the area average conductor length is less than half resulting less power needed trough conductor to drive capacitance.

And design change - sure. It's logically same core but as designed with different rules and density it probably have totally different layout compared to regular Zen. That would be a problem for Intel as they manually design their layouts but not a problem at all for automated TSMC designs.
 

Abwx

Lifer
Apr 2, 2011
10,948
3,458
136
Load power efficiency of RPL mobile is pretty impressive. Not entirely shocking of course, running many core at a more modest clock will result in better throughput than running a higher frequency on fewer cores all things equal. Hopefully they work on the idle power. This is a good result given the node disadvantage.
View attachment 76346


To put things on perspective it does 25000 pts@90W, same as a 12C/24T 7900X@88W, or a 16C/32T 7950X@65W (and 30 000 pts@88W) wich will be the comparison once thoses products are launched for gaming laptops.
 

DrMrLordX

Lifer
Apr 27, 2000
21,632
10,845
136
I wouldn't refer to some networking ASICs as server CPUs.

Uh, the Tremont "Ridge" SoCs are not ASICs.

And regardless of the semantics, Sierra Forest is what will bring Atom to their server lineup as a first class citizen.

Your confidence in Intel's ability to execute is astounding, but hey maybe they will have it ready as a "first class citizen" by 2024. Personally I think Intel will sell it to a few select customers and that's it. It'll get all the market penetration of Cooper Lake or Cascade Lake-AP.

Gracemont has approximately twice the PPA vs Golden Cove. Explain what magic you think Intel could could pull out to double the PPA of the big core that wouldn't affect Atom?

They could have cores as good as Zen4. For example. Take a look at Genoa vs. Sapphire Rapids: Intel isn't even in the same ballpark anymore.

Atom is comparable to ARM's N / A7XX series, not E / A5XX series. Which have gotten plenty of adoption.

Not within the same ecosystem they aren't. Gracemont currently occupies the position held by cores like A510 in phone SoCs. ARM just isn't throwing around as many fat cores (yet) as x86. Give it time, you'll see.
 
  • Like
Reactions: ftt