Question Zen 6 Speculation Thread

Page 24 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Senior member
Dec 29, 2023
778
1,236
96
But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation.
I believe they can, if they move away from the clockspeed ideology.

In the Golden days of scaling, you were uarch limited in terms of clocks so high pipeline stages got you a lot more. So 40% increase in pipeline might have resulted in say 25-30% increase in clocks. 10 vs 20 stages might be 60-70% difference in clocks. 3GHz vs 5.xGHz is a lot to overcome.

Now you have 9-10 stage pipeline CPUs reaching 4.4GHz, and above 5.X GHz you run into thermal density issues, so you need to do stupid things like widen the space between transistors to reduce that making it larger too. And you are doing that even though the 5.x GHz CPU has a near 20-stage pipeline. You have chips like Raptorlake literally frying itself with extra voltages to get to 6GHz.

And uop caches are better avoided. The reason? The more the cores are limited by power, die size, lower scaling, the less speculative gains are worth it. Uop cache hit is at best a chance on hit, while avoiding it and increasing it elsewhere is a guarantee. Branch predictors will never hit 100% accuracy, so there's always room for uncertainty, so those extra stages make it worth. Remember that the uop cache itself adds 2 extra stages on a miss, which is why we went from 14 stages on Core to 14-18 on Sandy Bridge.

The OC headroom for modern CPUs are zero for this reason as well. While it has been painfully slowly creeping up, above 5GHz has always been the domain of exotic cooling, regardless of pipeline stages. What happened was cooling has not only advanced, but become significantly larger too. You should see how small "power hungry" Prescott heatsinks are compared to the modern literal aluminum bricks. Or how water cooling has become common, when it used to be exotic cooling domain too.
 
Last edited:
  • Like
Reactions: Tlh97 and FlameTail

LightningZ71

Golden Member
Mar 10, 2017
1,781
2,135
136
With memory bandwidth not slated to increase much for Zen6 (we're assuming one more AM5 generation), it stands to reason that they aren't going to be targeting massive MT throughput improvements. It stands to reason that there will not be a core count increase as it just won't be accommodated in most MT tasks. The only thing that would throw a spanner in the works is a healthy dose of MALL cache. Even then, it would need to be quite large to have broad applicability.
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
After we concluded that gratis cores must be provided, does it follow that we are entitled to get host consolidation for free too?
It costs what $20? additional for them to swap a low bin Turin-D CCD in place of a Granite Ridge CCD. And they can charge $150 more for the part, increasing their ASP. Choose any numbers you like. It is genuinely in AMD's best financial interest to make such a part. Maybe in time for Arrow Lake refresh.
 

inquiss

Member
Oct 13, 2010
179
261
136
Not at all. I said that you cannot discount AMD making a 8+16 part. Even if memory bandwidth doesn't increase, they did so on Strix Point already. It isn't a fairy tale or Santa's wishlist it is literally one AMD exec wanting to increase their client group ASP by 0.1% away from existing. If Intel provides the motivation... so it may be.

And in Zen 6 it is *inevitable* even if some SKUs launch in AM5 with the same memory bandwidth that the core count increases. 10% IPC generation has to deliver something.
Nah, AMD use smaller cores for efficiency only. Why would they bring that to desktop? They won't. They have enough of a scheduling issue on laptop, why bring that to desktop when you haven't got the upside of the efficiency
 
  • Like
Reactions: Tlh97 and marees

inquiss

Member
Oct 13, 2010
179
261
136
You have to pay overhead for each additional node. Additional PSU, chassis, motherboard, etc.

Better to have a single node with X cores, than 2 nodes with X/2 cores.

Also, not all workloads even support or are suitable for multiple nodes. So it's DOA for those use cases. Additionally, a lot of people think it's to much of a hassle to bother with multiple nodes. Messier to configure, takes up more space, latency when communicating between nodes etc.

If someone is having use cases where they really want a huge number of cores, then I can understand that multiple nodes could be a good solution. Or going cloud and rent whatever you like. But not if you're looking for a 24/32C type of system (or even 64C).
If you want it all in one system, and you have a use for those cores. You get threadripper or epyc. It really is that simple
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Nah, AMD use smaller cores for efficiency only. Why would they bring that to desktop? They won't. They have enough of a scheduling issue on laptop, why bring that to desktop when you haven't got the upside of the efficiency
Intel is on N3B shortly. Big efficiency gains. 2024 Arrow Lake will necessitate the launch of the X3D parts. Intel will have Arrow Lake Refresh in 2025. And they've shown they don't shy away from pushing parts to the limit for a refresh.

If you're AMD do you simply sit idle for 18+ months constantly decreasing your average selling price? Perhaps. Or maybe a 9950XT would be enough. But that might take some of the best binned parts from Turin, wasteful. So why not take a mediocre Turin-D CCD and go back to the 1800X/3950X moar cores no games strategy for a single part in your entire line-up which no one is forced to buy against their will and which you can conceivably sell for more than a 9950X (in small quantities)?

Just don't write it off.
 
  • Like
Reactions: Tlh97 and marees

DavidC1

Senior member
Dec 29, 2023
778
1,236
96
If you're AMD do you simply sit idle for 18+ months constantly decreasing your average selling price?
Isn't that exactly what happened in the previous generations?

The common leakers are saying 8+32 ARL is canned so a ARL refresh may be at best 14900K like update. Which AMD will be able to counter easily with current Zen 5 parts and X3D.
 
  • Like
Reactions: yuri69 and inquiss

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Isn't that exactly what happened in the previous generations?
X3D "saved" them. But go look at their client group revenue. Are they happy with being the lowest margin segment? And this time they need X3D even to compete with base ARL. Are we looking forward to negative margin?
 

DavidC1

Senior member
Dec 29, 2023
778
1,236
96
X3D "saved" them. But go look at their client group revenue. Are they happy with being the lowest margin segment? And this time they need X3D even to compete with base ARL.
This is Intel's only saving grace, which are now being hit by the overvoltage issues.

Weaknesses are always going to exist. AMD is doing much better on the laptop and the server market anyway.
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
This is Intel's only saving grace, which are now being hit by the overvoltage issues.

Weaknesses are always going to exist. AMD is doing much better on the laptop and the server market anyway.
Yes. But people continue to write it off categorically rather than entertain the possibility that AMD may make a part almost no one should ever buy (2990WX, 3800XT, 5900XT...).

And I'll add that the 7950X remained competitive in MT even with 14900K. 9950X will be less competitive with ARL in MT than that was and it only gets worse with refresh.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,889
8,757
136
[again straying off topic]
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
[again straying off topic]
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.
Thank you! It only took 5 pages for someone to say some actual reason(s) it might not happen. Much better arguments than memory bandwidth or no one will buy it.
 

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
Going back on track, I hope the rumours about AMD separating core designs for each segment are true for Zen6.
 

FlameTail

Diamond Member
Dec 15, 2021
3,757
2,203
106
Going back on track, I hope the rumours about AMD separating core designs for each segment are true for Zen6.
Wasn't that supposed to be with Zen7?

Zen6 will use the same uarch in both client and server iirc. The difference is that client and server use different CCDs.
 
  • Like
Reactions: Tlh97 and marees

inquiss

Member
Oct 13, 2010
179
261
136
Wasn't that supposed to be with Zen7?

Zen6 will use the same uarch in both client and server iirc. The difference is that client and server use different CCDs.
Interesting. Don't know anything about zen 7. Client (desktop and luggages) supposedly the same thing in zen 6. Server gets more specialisation there. Different memory technologies
 

soresu

Diamond Member
Dec 19, 2014
3,190
2,463
136
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?
They did with E1/A65, which according to their PR seemed to be as good as A510 in ST perf and probably better in MT while being significantly more efficient too.

Exactly why they abandoned that path for A510 is uncertain, but what I definitely do know is the Neoverse cores are closely matched to the Cortex cores.

Even though some Neoverse cores have had extra functionality vs their Cortex counterparts, the core itself seems largely the same.

If SMT is a non trivial addition I can see why they wouldn't put it in Neoverse V or N because the need for it in smartphone or tablet cores is pretty low.

That being said - the shift to WoA/PC SoC's might change some attitudes going forward, who knows what the future will bring.
 

soresu

Diamond Member
Dec 19, 2014
3,190
2,463
136
We have a pretty good idea from that slide which nailed Zen 5. Bigger core complex. 10%+ IPC.
I meant about the thread in general.

Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.

It was many months before Zen 5 - now while that's a stretch it is possible that they had an accurate idea of Zen 5's perf.

Add a good 16 months on top of that for Zen 6 readiness/release and the slide is way too far out from Zen 6 actually being operational, to say nothing of it being extremely weird for such a cagey company like AMD to have divulged that much information about future products unless it was for semi custom clients thinking about what IP to put in their future SoC design, and even then I have doubts that this would be disclosed in slide form rather than directly by word of mouth from Lisa Su or another exec.
 
  • Like
Reactions: marees

DavidC1

Senior member
Dec 29, 2023
778
1,236
96
when do get OoO superscalar architectures from AMD?
What? Surely you meant something else? PC chips have been superscalar since the original Pentium and x86 has been OoOE since Pentium Pro(Pentium II for client). Superscalar merely means more than 1 decoder.
– The Turin-dense CCD doesn't look to me as if it would physically fit into the AM5 package (along with a classic CCD and the cIOD).
– Having to keep Turin-dense CCX L3 tags would be something new. Raphael's ( = Granite Ridge's) IOD may or may not be capable to do that.
Yes, it assumes the CCDs are perfectly identical. It is very possible that it's not.
If SMT is a non trivial addition I can see why they wouldn't put it in Neoverse V or N because the need for it in smartphone or tablet cores is pretty low.
SMT the way AMD/Intel uses barely takes up any space, and is actually quite efficient perf/W wise.

It has always been a risk in terms of execution, because it complicates validation, which potentially increases the risk of the project slipping, which may be worth way more than other factors.

Intel held off SMT until Nehalem because the Oregon team had experience with it because they built Netburst. The Haifa team didn't hence Core 2 skipped it.

If the particular design team sees it as a big risk then they won't use them, plain and simple.
Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.
Those differences won't make Zen 6 go from 10% to 32%. The seeds of hype can be planted in many ways, including the post above.
 
Last edited:

SpudLobby

Senior member
May 18, 2022
976
669
106
I meant about the thread in general.

Also I have serious doubts about the accuracy of that slide vis a vis post Zen 5 info.

It was many months before Zen 5 - now while that's a stretch it is possible that they had an accurate idea of Zen 5's perf.
I mean, so? Sure they could have had a good idea of the performance. Exist50 did and told me all about it even while I was skeptical it’d be that middling (relative to the structure changes and what they need anyways).

Doing “we just don’t know” on some level is pretty much just giving cover to circle jerks here that will evolve. I’d go ahead and bet on an 8-13% integer gain with Zen 6, minimal clock changes, some power gains and a pretty similar ST perf/W gap with Apple and Arm, Qualcomm as usual.
 

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
What? Surely you meant something else? PC chips have been superscalar since the original Pentium and x86 has been OoOE since Pentium Pro(Pentium II for client). Superscalar merely means more than 1 decoder.
oh I was way off. I thought it meant high IPC with low clocks. Apologies.
 

MadRat

Lifer
Oct 14, 1999
11,938
264
126
AMD already supports more bandwith. You are simply not maximizing the existing IF subsystem currently because its desynchronizes memory clocks. You have to design an external memory controller to do it and you can raise your bandwidth significantly to the core system using caches, maximum flck, and interleaving memory channel technologies. But it will cost you.
 
  • Like
Reactions: Tlh97 and Gideon