Question Zen 6 Speculation Thread

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
And why wouldn't it be possible to achieve the ST performance levels of Apple? Without resorting to discredited ISA superiority arguments....
Relatively? They stand no chance - Apple is on a 12 month cadence, has a higher R&D budget, monopolizes leading nodes, and ships far more units than AMD.

Absolutely? They are losing to designs that don't even have nor need a uop cache and save area and power as a result.

But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation. Unless Intel can deliver another Skymont-like upgrade (and I'm not convinced) it seems we're at the whimpering end of the less competitive x64 market. There's no way Zen 6 is competitive without more cores in 2026.
 
Last edited:

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
Relatively? They stand no chance - Apple is on a 12 month cadence, has a higher R&D budget, monopolizes leading nodes, and ships far more units than AMD.

Absolutely? They are losing to designs that don't even have nor need a uop cache and save area and power as a result.

But your argument doesn't hold up at all because Zen 5, Lion Cove and Zen 6 all deliver meager ST improvements. It isn't a choice of more MT at the expense of ST. MT is the only thing they can increase more than 10-15% next generation. Unless Skymont successor solves the x64 front end problem (and I'm not convinced)
before that AMD needs to fix the memory bandwidth issue as well. Dual channel DRR5 isn't enough. We need more bandwidth on desktop and upgrade the old fabric AMD!
 
  • Like
Reactions: carancho

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
before that AMD needs to fix the memory bandwidth issue as well. Dual channel DRR5 isn't enough. We need more bandwidth on desktop and upgrade the old fabric AMD!
For a useful part. But that doesn't matter for Cinebench. And that's enough to sell it a bit.
 
  • Like
Reactions: marees

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
The ACHIEVABLE triangle (performance, efficiency, density) of N4P is better than N5P in every way. This does NOT preclude a customer from using a dense N5P configuration that has greater density than a performance, relaxed density product on N4P.

While I don't have hard figures in front of me, it is entirely possible that the density and efficiency focused M2 product could be denser than the performance and efficiency focused Strix Point product, even if N4P is a better node than N5P.
My point is there is nothing stopping AMD from designing a SoC/APU like the M Pro. The die size is comparable to Strix Point so it shouldn’t cost more.


The way AMD and Intel design cores for laptops is not good. Clocking to 5.1GHz and using 20 watts to score lower in cinebench 2024 ST than a M2 @ 3.6GHz from 2022.

when do get OoO superscalar architectures from AMD? Is Sound Wave custom ARM IP?
 
  • Like
Reactions: Joe NYC

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
The way AMD and Intel design cores for laptops is not good. Clocking to 5.1GHz and using 20 watts to score lower in cinebench 2024 ST than a M2 @ 3.6GHz from 2022.
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.
 

FlameTail

Diamond Member
Dec 15, 2021
3,757
2,203
106
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.
This has been on my mind for a while too. I was writing it off as AMD having the benefit of SMT.
 

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.
SMT plays a huge part in AMD's MT score. Around 25-26% is due to SMT.
1723524073715.png
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
SMT plays a huge part in AMD's MT score. Around 25-26% is due to SMT.
View attachment 105208
Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt? If it was some bandwidth limit SDXE has the advantage there too and it was my understanding CB isn't that memory bandwidth sensitive.
 

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt?
Well not in all cases. In blender SMT uses no extra power comsumption. More than Zen5, its AMD's version of SMT that impresses me more.
1723524489174.png
1723524535722.png
Edit: added full app chart
 

Attachments

  • 1723524804256.png
    1723524804256.png
    131.3 KB · Views: 17

FlameTail

Diamond Member
Dec 15, 2021
3,757
2,203
106
Also another thing to note is that Strix Point has 8 Zen5C cores, and those are more efficient than standard Zen5 cores.
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Well not in all cases. In blender SMT uses no extra power comsumption. More than Zen5, its AMD's version of SMT that impresses me more.


Edit: added full app chart
Hmm, the full chart gives totally different impression than David Huang's test. It includes many tests and in none(?) of them is the power going up 30-40% like Huang measured. I assume that's because with SMT disabled these parts then boost slightly higher to consume the available power? And Huang's tests were at a fixed clock rate.
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?

Secondly (and this may be a joke) why isn't AMD now pursuing SMT4 for Zen 6? Diminishing returns?
 

poke01

Golden Member
Mar 8, 2022
1,991
2,527
106
Secondly (and this may be a joke) why isn't AMD now pursuing SMT4 for Zen 6? Diminishing returns?
relying on SMT too much is also a bad thing for gaming.

It’s better for AMD to eke out as much IPC out of Zen as possible and remove bottlenecks from the core.

It’s going to be much harder to increase single core performance without thinking outside the box. Node progress slowed down and ingenious solutions must be sought by chip designers.
 
  • Like
Reactions: marees

FlameTail

Diamond Member
Dec 15, 2021
3,757
2,203
106
Also if SMT is enough to transform x64 from wildly inefficient to competitive performance per watt in some workloads then why hasn't ARM pursued it for use in servers?
It's not less of an x86 thing and more of an AMD thing. It's been known for a while that AMD's SMT implementation is better than Intel's.
 

moinmoin

Diamond Member
Jun 1, 2017
5,063
8,025
136
People are buying 24 core 14900K/13900K despite all its faults.
Doesn't seem to be too many people though going by sales charts, even before the current issues.

That's one thing that has been bothering me.
Qualcomm, for example, shows as much more efficient than HX370 in Cinebench R24 in ST. At ~4.2GHz.
But in MT, the X1E-84 and HX 370 are very close in total score and score per watt when both are limited to the same TDP (at least according to Notebook Check tests).
How does the HX 370 gain so much relative efficiency when they're both the same core count? SMT? David Huang's tests show that power increases nearly in proportion to increased throughput for SMT in a variety of workloads. Shouldn't even more active inefficient x64 decoders (now 24 of them for Strix instead of apparently only 1 in ST) cause HX 370 to become even less efficient relative to SDXE in MT than in ST? But instead it catches up. Sure they are now throttled to a more efficient clock rate but so too is SDXE.

🤔 It just doesn't make sense to me.
Beside SMT another factor to keep in mind is the base power use of the uncore. Traditionally AMD has a pretty bad idle and base power use, but the excellent efficiency of their cores can make up for it. That's what's happening here as well, ST looks bad due to the worse starting point, but in MT the cores' efficiency can catch up, masking the uncore handicap.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,474
1,964
136
Yes, that explains matching the final score but according to Huang's test it should increase power too? So how does it also catch up in performance per watt? If it was some bandwidth limit SDXE has the advantage there too and it was my understanding CB isn't that memory bandwidth sensitive.

SMT increases power consumption linearly with increasing performance. Raising clocks raises power consumption to the second or third power with increasing performance. If you get 1.3x perf with 1.3x power from SMT, and then drop clocks (and voltage with them) until your power is back at 1x, your perf will be above 1x.

Well not in all cases. In blender SMT uses no extra power comsumption.

This looks to me like both cases are up against a fixed power limit and throttling to maintain it.
 

StefanR5R

Elite Member
Dec 10, 2016
5,889
8,757
136
You know your "just add more nodes" is specious. Adding more nodes doesn't change anything. It just doubles or triples any throughput/$ disadvantage.
Side note: As long as the application scales with node count¹, throughput/$ doesn't increase or decrease with node count, it remains constant.

________
¹) and if it doesn't require increasingly complex cluster interconnect topology
 
Last edited:

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Side note: As long as the application scales with node count¹, throughput/$ doesn't increase or decrease with node count, it remains constant.

________
¹) and if it doesn't require increasingly complex cluster interconnect topology
Yes, exactly. Adding nodes does nothing to help. Adding more nodes with a worse throughput/$ simply wastes more money in total.

And since core spam products usually have good throughput/$ for these workloads they are, in fact, complementary (to node spam).
 
  • Like
Reactions: Fjodor2001

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Yes, exactly: It does not double or triple any throughput/$ disadvantage. :-)
How do you want me to say it? Instead of being down $200 you're down $400. The amplitude of the waste simply grows larger with more nodes.

It does nothing to help and only makes things worse in an absolute scale. It makes a small difference in throughput/$ into a larger difference in $. It simply multiplies the consequences of selecting a part with less throughput/$.

You clearly understand that more nodes doesn't solve any deficit. It isn't a solution, only a multiplier of AM5's fewer core problem for these type of workloads.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,889
8,757
136
At this point I am completely lost as to what people really want. Seemingly not just more cores, but gratis cores.

I thoroughly regret to have engaged in this discussion and apologize for my part in bringing down the S/N ratio of this thread.
 

inquiss

Member
Oct 13, 2010
179
261
136
At this point I am completely lost as to what people really want. Seemingly not just more cores, but gratis cores.

I thoroughly regret to have engaged in this discussion and apologize for my part in bringing down the S/N ratio of this thread.
You've nailed it. People in this thread seem to want the option to buy higher core count CPUs to use in sockets that will be murdered by low memory bandwidth. And they think AMD should either hamper the cost of mainstream by increasing channels on the mainstream platform or they just want them free and hampered because reasons.
 

gdansk

Platinum Member
Feb 8, 2011
2,836
4,218
136
Not at all. I said that you cannot discount AMD making a 8+16 part. Even if memory bandwidth doesn't increase, they did so on Strix Point already. It isn't a fairy tale or Santa's wishlist it is literally one AMD exec wanting to increase their client group ASP by 0.1% away from existing. If Intel provides the motivation... so it may be.

And in Zen 6 it is *inevitable* even if some SKUs launch in AM5 with the same memory bandwidth that the core count increases. 10% IPC generation has to deliver something.
 
  • Like
Reactions: Tlh97 and marees

maddie

Diamond Member
Jul 18, 2010
4,878
4,951
136
How do you want me to say it? Instead of being down $200 you're down $400. The amplitude of the waste simply grows larger with more nodes.

It does nothing to help and only makes things worse in an absolute scale. It makes a small difference in throughput/$ into a larger difference in $. It simply multiplies the consequences of selecting a part with less throughput/$.

You clearly understand that more nodes doesn't solve any deficit. It isn't a solution, only a multiplier of AM5's fewer core problem for these type of workloads.
Waste? Don't you get double the throughput for double the money?
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,989
440
126
Waste? Don't you get double the throughput for double the money?
You have to pay overhead for each additional node. Additional PSU, chassis, motherboard, etc.

Better to have a single node with X cores, than 2 nodes with X/2 cores.

Also, not all workloads even support or are suitable for multiple nodes. So it's DOA for those use cases. Additionally, a lot of people think it's to much of a hassle to bother with multiple nodes. Messier to configure, takes up more space, latency when communicating between nodes etc.

If someone is having use cases where they really want a huge number of cores, then I can understand that multiple nodes could be a good solution. Or going cloud and rent whatever you like. But not if you're looking for a 24/32C type of system (or even 64C).
 

StefanR5R

Elite Member
Dec 10, 2016
5,889
8,757
136
After we concluded that gratis cores must be provided, does it follow that we are entitled to get host consolidation for free too?