Question Zen 6 Speculation Thread

Page 333 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

511

Diamond Member
Jul 12, 2024
5,288
4,708
106
Consider the example of a program's inner loop parallelized by e.g. OpenMP and executed on either a mix of dense and non-dense cores (hint: they all finish their parts of the loop body at about the same time) or on a mix of cores of different Core Microarchitectures (hint: they handle their parts of the loop different from each other because they have different instruction latencies and whatnot).

Which approach works? Both of them. Which works well? One of them.
dense and non dense cores have different frequency so that would affect the results a Core running at higher Frequency will finish faster than the one running at lower Frequency albeit if not memory bound.
 
  • Like
Reactions: Fjodor2001

Fjodor2001

Diamond Member
Feb 6, 2010
4,504
710
126
Consider the example of a program's inner loop parallelized by e.g. OpenMP and executed on either a mix of dense and non-dense cores (hint: they all finish their parts of the loop body at about the same time) or on a mix of cores of different Core Microarchitectures (hint: they handle their parts of the loop different from each other because they have different instruction latencies and whatnot).

Which approach works? Both of them. Which works well? One of them.
A much better approach is to divide the complete work into smaller chunks. Once a core has completed a chunk it gets assigned a new chunk to work on. Some cores will complete more chunks than others (depending on how fast the core is, and how much work is required in practice per chunk). Problem solved.

Of course there can be poorly designed SW, but that you cannot protect fully against. E.g, there could be SW that is hard coded to split up the work into exactly 4 chunks (even if it could be split up to many more) regardless on number of CPU cores available.

But we can argue back and forth endlessly about specific use cases. All said and done, everyone in the business is heading towards big.LITTLE style CPUs (and most have been there for a long time). So they must have concluded that the pros overrule the cons.
 
Last edited:
  • Like
Reactions: inquiss

511

Diamond Member
Jul 12, 2024
5,288
4,708
106
They have different maximum frequency.

When will people get it? It is not too hard. Cores running at variable frequency based on a bunch of factors is not that new anymore.
even under the same condition dense and non dense have different Physical Design so it's near impossible for them to operate at the same V/F curve
 
  • Like
Reactions: MoogleW

yuri69

Senior member
Jul 16, 2013
696
1,248
136
AMD users laughed at Intel ADL's big.LITTLE, yet AMD was just a gen or two late with big.LITTLE...

What AMD really needs to step up its game in mobile is to provide a top-noch SW/HW support and viable (complete) platforms to OEMs. OEMs juggling with 7 different SKU lines doesn't seem like it.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,504
710
126
All problems are exactly like Cinebench's problem.
Therefore, all CPUs should have Cinebench accelerators built in.
Again:
But we can argue back and forth endlessly about specific use cases. All said and done, everyone in the business is heading towards big.LITTLE style CPUs (and most have been there for a long time). So they must have concluded that the pros overrule the cons.
 

StefanR5R

Elite Member
Dec 10, 2016
6,825
10,922
136
We can argue back and forth but AMD doesn't do big.LITTLE. Apparently their own conclusion was a different one.

even under the same condition dense and non dense have different Physical Design so it's near impossible for them to operate at the same V/F curve
They have similar enough v/f curves with the major exception that the curve of the non-dense cores has a longer tail.
 

biostud

Lifer
Feb 27, 2003
20,102
7,205
136
AMD users laughed at Intel ADL's big.LITTLE, yet AMD was just a gen or two late with big.LITTLE...
I think it had something to do with the disabling of AVX512 to get consistent performance from their CPUs, which was considered to be quite a step back.
 

511

Diamond Member
Jul 12, 2024
5,288
4,708
106
When we discuss "core Voltage over core frequency" (and core frequency control while executing multithreaded programs) and then get to see a "SIR17r1 perf over Vcore VRM Telemetry (W)" graph...
Power is directly proportional to voltage and performance is directly proportional to Frequency for the same uArch ...
This doesn't quite work since stx1 dense cluster has halved L3 access.
Yeah but it's the best info I have for power performance characteristics.
 
  • Like
Reactions: MoogleW

fastandfurious6

Senior member
Jun 1, 2024
882
1,029
96
AMD users laughed at Intel ADL's big.LITTLE, yet AMD was just a gen or two late with big.LITTLE...

No.

Strix Point (STX HX370, currently the flagship "AMD big.little") is a failure. I proved it here

In every single scenario, 9955HX is better than STX
(9955HX = literally 9950X on mobile, 16 full cores)




Right now they have one and the same core, but either physically optimized with a focus on peak frequency, or physically optimized with the focus more shifted towards area. (Besides different FP pipeline width in different markets.)

Ok dense c-cores save die area. This makes a lot of sense in server/EPYC where most want to pack max amount of cores in chip.

But... according to rumors here there's gonna be like 7 mobile chip families for zen6 and almost all gonna have the hybrid full+dense 4p+8c

Most of them, just zen6 4p+8c and really not much else (except Halo models w big iGPU)

Literally just 4p+8c in a 2nm generation where they can fit 12 full cores in CCD

The flagship will classically be 12+12 full cores and will have the #1 performance in all areas

The question is, what are they saving die area for?

On server/EPYC it's simple answer, they want to fit as many cores as possible.
But when just 4p+8c .... why not just full 12 cores CCD?
 

511

Diamond Member
Jul 12, 2024
5,288
4,708
106
Strix Point (STX HX370, currently the flagship "AMD big.little") is a failure. I proved it here

In every single scenario, 9955HX is better than STX
(9955HX = literally 9950X on mobile, 16 full cores)
In a laptop form factor 9955HX is a failure cause it can't last for more than couple of hours
 
  • Like
Reactions: marees

marees

Platinum Member
Apr 28, 2024
2,166
2,799
96
The flagship will classically be 12+12 full cores and will have the #1 performance in all areas

The question is, what are they saving die area for?

On server/EPYC it's simple answer, they want to fit as many cores as possible.
But when just 4p+8c .... why not just full 12 cores CCD?
i believe it depends on the market segmentation of laptop customers

the kind of customers who buy a MDS1 Hi with extra 12p ccd are also likely to choose a discrete nvidia laptop gpu, basically the whales

plus customers of medusa halo with optional 12p ccd would likely buy huge amount of RAM too — basically premium customers

all the others are unlikely to pony up the extra margin for a 12p core ccd. so they should be fine with 4p + 8c + 2lp (MDSP / MDSH base / Magnus) or 4p + 4c + 2lp (MDS1 base)

bumblebee (MDS3 2p + 2c + 2lp) required to replace krackan. It is there on leaked roadmaps as launching late 2027
 

fastandfurious6

Senior member
Jun 1, 2024
882
1,029
96
In a laptop form factor 9955HX is a failure cause it can't last for more than couple of hours

Halo (9955HX+gpu) can last 10+ hours.... how? because they couldn't wait some months for the new low-power interconnect to be ready for 9955HX

it's the not the CPUs fault. I bet Halo has better battery life than STX too lol.

STX is a failure 100%, would only work as replacement to Phoenix for same price but it's x2

the kind of customers who buy a MDS1 Hi with extra 12p ccd are also likely to choose a discrete nvidia laptop gpu, basically the whales

all of them, every single one will buy the flagship 10955HX3D or whatever they call it

24 full cores + 3D only. no reason to buy anything else really. no one will bother with "medusa premium" or medusa point or whatever

I'm actually worried there may not be a market for Medusa Halo even? if AMD plans to release 5 types of "medusa point high premium" then they intend to price the real Halo to high heavens, at which point it's not a viable purchase.... despite being literally the #1 chip in world
 

fastandfurious6

Senior member
Jun 1, 2024
882
1,029
96
Cause profits? The die can't be bloated - its an expensive die using 2/3nm and think of all those things packed in - GPU, idiotic AI slop generators, IO, etc.

really AMD sees such profit opportunity in having 4p+8c rammed literally everywhere in 5 product families instead of classic 12 full cores ccd?

is profit the answer?
 

marees

Platinum Member
Apr 28, 2024
2,166
2,799
96
I'm actually worried there may not be a market for Medusa Halo even? if AMD plans to release 5 types of "medusa point high premium" then they intend to price the real Halo to high heavens, at which point it's not a viable purchase.... despite being literally the #1 chip in world
there will always be someone who is willing to pay $3000 to $4000

imagine how much RAM 384bit lpddr6 can hold !!
 

poke01

Diamond Member
Mar 8, 2022
4,715
6,060
106
I'm actually worried there may not be a market for Medusa Halo even? if AMD plans to release 5 types of "medusa point high premium" then they intend to price the real Halo to high heavens, at which point it's not a viable purchase.... despite being literally the #1 chip in world
There will always be buyers for high end SKUs especially in laptop world.

2027 is so far away though.
 
Last edited:
  • Like
Reactions: 511

StefanR5R

Elite Member
Dec 10, 2016
6,825
10,922
136
Literally just 4p+8c in a 2nm generation where they can fit 12 full cores in CCD
The 4 classic + 8 dense configs are in monolothic chips, which are baked in N3. (And even if there were in N2, why waste that much area for something that is never needed in mobile? I.e. provide for an insane f_max in all cores, if only a couple of them are ever going to run at f_max? — Again, that's old, I can't believe it still gets brought up.)
Power is directly proportional to voltage
Nope.
performance is directly proportional to Frequency for the same uArch
Not quite.
 
Last edited: