Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 441 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thunder 57

Platinum Member
Aug 19, 2007
2,734
3,939
136
Weird hypothesis, but frankly Intel and AMD always used the numbers very poorly.
Out of 9 numbers, 4 are used: 3/5/7/9. It's a generally poor system because you don't want to have many tiers(muddies the lineup), and you don't want to have lower numbers than 5. Anyone identifies i3/R3 as "bad", to the point that some noobs actually think a 2023 i3 is worse than a 2014 i7.

Frankly I think clothes did it pretty well: S/M/L/XL?
Make 4 tiers:
- Light
- Medium
- Heavy
- Ultra

But it's kind of problematic too since it's a two sided coin, you get Bigger Number Better just like you get Lower Number Worse.

Your background in marketing is showing. I don't think there is a good way to label these. Is a 2023 Light worse than a 2014 Ultra? Probably, but you can't tell by the name alone.
 

Mahboi

Senior member
Apr 4, 2024
564
884
91
Everything is AI now.
I really wonder what the heck does MS expect to do out of all the pizzazz they're forcing everyone to go with.

I mean if you wanna get on the stupid Jensen hype train that's one thing, but you must at least have a thing to sell.
All I've heard about is an "AI assistant". AI Assistant? Like Clippy but for everything? 99% of what I do with my PC won't have any better experience with AI, it's going to be the same but with an extra step. Like instead of "press windows key, type software name + Tab + Enter", "press AI key, type software name + document you want it to open + Tab + Enter" and ClippAI opens the soft directly on the right file? Cool and all, but by the time the AI will have gotten the query right, I'd have opened it myself.

I hope it goes a little deeper than that, but until it does, I'm really not sure what the plan is here. For big server LLM it makes sense, but for what you'll run off 16/32Go DDR5 and a 45 TOPS NPU?
 

StefanR5R

Elite Member
Dec 10, 2016
5,620
8,085
136
Could the LP cores just be stripped of all the "go fast" bits and anything else that can't be emulated in microcode and essentially be maximally efficient microcode engines instead? They could retain full isa compatibility with the performance cores.
AMD obviously took traces from what they perceived as relevant usages, and from there concluded what to support and how to support it.

Weird hypothesis, but frankly Intel and AMD always used the numbers very poorly.
Out of 9 numbers, 4 are used: 3/5/7/9. It's a generally poor system because you don't want to have many tiers(muddies the lineup), and you don't want to have lower numbers than 5. Anyone identifies i3/R3 as "bad", to the point that some noobs actually think a 2023 i3 is worse than a 2014 i7.
Remember that 9 is just a late afterthought by Intel and AMD.
The 3/5/7 naming scheme is much older and long known from BMW for cars, later from Intel for Xeons, then for the "Core" line, and most recently from AMD for Ryzen.

Frankly I think clothes did it pretty well: S/M/L/XL?
Make 4 tiers:
- Light
- Medium
- Heavy
- Ultra
Reminds me of the older USB speed denominators, and Spaceballs' speed levels. :-P

(Also, who fancies a Heavy notebook processor?)
 

Hitman928

Diamond Member
Apr 15, 2012
5,407
8,308
136
Absolutely other way around, buffering is there to reduce signaling latencies.

Typically it is expressed in terms of delay and timings in the digital IC world but I think I get the question being asked now.

Adding the buffer cell adds an additional gate delay, but reduces the gate delay of the previous stage more than the delay it is adding, so the overall effect is less delay (latency) which then allows you to increase the frequency without timing violations.
 
  • Like
Reactions: Tlh97 and Thibsie

Mahboi

Senior member
Apr 4, 2024
564
884
91
Have their stock price go up, since any mention of the word AI "Investors" throw money at it.
Still waiting for Nutella, Lisa & Pat to do a joint presentation where all they do is a song & dance bit dressed as mariachis, complete with giant hat, singing AI AI AI AI AI AI AI AI, while Jensen looks at them laying on a couch, drinking wine and laughing like he's Jabba the Hutt.
 

Mopetar

Diamond Member
Jan 31, 2011
7,950
6,258
136
Memory capacity is king. Strix Halo + 64GB RAM will beat a 4090 in any AI task if you scale it up sufficiently. And for local LLMs where capacity is by far the biggest bottleneck, it won't even be close.

Strix Halo directly competes for the semi-pro market that is buying 4090s instead of RTX 6000s. Significantly slower for tasks that fit inside 24GB, but can be used for tasks a 4090 can't touch.

Maybe there's a market there for that, but anyone doing anything for an actual job is probably better off spending the money to get an H100 or MI300 that has even more memory, far more resources, and the bandwidth to keep all of the hardware fed.

Running LLMs is going to be done by an NPU which is basically built for it and that can use system memory which means Strix Halo doesn't have the advantages you think by comparing it to GPUs.

Also, what's the specific use case that fits into 64 GB, but not 24 GB that qualifies as semi-pro? I can see a market for people who use CAD tools that like lots of memory on top of a beefy GPU and CPU. Even people who do 3D animation might want something like this as well depending on the final product. Also AMD could easily make a GPU with clamshell memory to tap this market. Then it's a manner of what fits into 64 GB but not 48 GB. The W7900 already offers this, but of course that's a professional card.
 

Mopetar

Diamond Member
Jan 31, 2011
7,950
6,258
136
With a price a tiny bit higher and consumption a tiny bit higher too. Or is this irrelevant?

First of all, I said it's a professional card, so it has the cost doubled just because companies will pay it and can write it off as a business expense.

They could make a 48 GB 7900 XTX if they really wanted to, but they'd rather sell you a $4,000 W7900 than a $1,500 (or probably less) 7900 XTXTX XXX (or whatever they'd call it)

Power consumption is of course higher because there's far more hardware. The memory bandwidth is better too. That and if its in a separate workstation machine or server that you're remoted into, you don't have a laptop that you can't really use because it's trying to train a model.

The more I think about it, the more silly trying to use a laptop for this seems. Model training can take hours or days depending on what you're doing. That's not something that I'd want to do on a laptop. The battery will only last for an hour working at full load and there's no reason that you can't just SSH into a workstation that is training your model while you have a long-lasting laptop that's just a front end.

Really the only reason to use one is that for whatever reason the performance/price is so good that you just plug it in and treat it like a workstation/server replacement like how miners were buying laptops just for the GPU in them and leaving them to mine all day.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,847
316
126
Not exactly, the LP island is supposed to be invisible and transition seamlessly depending on the load.

Agreed. As I said since a long time ago. AMD will be going big.LITTLE like more or less everyone else has already done. It's just that AMD is a little late to the party.

The question now is what the roadmap will be for AMD. Introduced/tried with Strix Halo? Then rolled out at wider scope with Zen5 Refresh or Zen6? And also, what will the core count be per SKU for the big vs LITTLE cores?
 
  • Haha
Reactions: spursindonesia

adroc_thurston

Platinum Member
Jul 2, 2023
2,650
3,878
96

Fjodor2001

Diamond Member
Feb 6, 2010
3,847
316
126
this is literally the opposite of a traditional HMP setup.
What is and why? And in any case, what's the problem with it?
Doesn't exist.
Yet
4+8 premium mobile, 4+4 mainstream mobile.
Till the universe dies to the heat death.
You're talking about what X+Y core combination (Zen5, Zen5C, Zen5LP, ...), on what SKUs? And since you're maxing out at 4+8, no Zen5LP in Strix Halo despite being mentioned previously in this thread as having Zen5LP (and 16C Zen5, which is above 4+8)?

And you're saying these (4+8 and 4+4 mobile) are definitely the only SKUs with b.L, and b.L / Zen5LP will not be used on any other AMD SKU to follow (e.g. Zen5 Refresh / X3D or Zen6)?
 
  • Haha
Reactions: spursindonesia

adroc_thurston

Platinum Member
Jul 2, 2023
2,650
3,878
96
What is and why?
HMP means all cores are OS-visible and can be active together.
It just doesn't.
on what SKUs?
Premium ultrathin and mainstream.
And since you're maxing out at 4+8, no Zen5LP in Strix Halo despite being mentioned previously in this thread as having Zen5LP (and 16C Zen5, which is above 4+8)?
Halo is 16c, you can't see or feel the LP cluster.
and b.L / Zen5LP will not be used on any other AMD SKU to follow (e.g. Zen5 Refresh / X3D or Zen6)?
you really can't read huh.
low power clusters are invisible.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,847
316
126
HMP means all cores are OS-visible and can be active together.
I know. But what's your point? And you didn't answer the question: "what's the problem with it?"
It just doesn't.
Yet. As mentioned previously.
Premium ultrathin and mainstream.
What about Strix Halo then? Or you're counting that as "premium ultrathin", despite 120W TDP? I guess it depends on what battery life can be expected.

Also, you're ruling b.L and LP cores on DT ever from AMD? Previously you ruled it out on mobile too, but now you're backtracking on that. So think carefully before making absolute statements, or you might have to backtrack on DT too later.
Halo is 16c, you can't see or feel the LP cluster.
So why was the LP cluster added on Halo 16C? Dark silicon, since the SKU is "just for fun"?

Also, what about perf/W with MT, no need to add it for that?
you really can't read huh.
low power clusters are invisible.
That's the point. LP cores have small die area so cheap, thus many LP cores can be added, thus improves MT perf at low cost, and with good perf/W.
 

adroc_thurston

Platinum Member
Jul 2, 2023
2,650
3,878
96
But what's your point? And you didn't answer the question: "what's the problem with it?"
Z5LP doesn't work in HMP configs. Or at least not planned to.
What about Strix Halo then?
That's a segment above premium ultrathin.
You know, MBP14" isn't a particularly thin or a sleek device anymore. It's a brick.
Also, you're ruling b.L and LP cores on DT ever from AMD?
Isn't that obvious?
Previously you ruled it out on mobile too
no?
So why was the LP cluster added on Halo 16C?
to idle.
Also, what about perf/W with MT, no need to add it for that?
It doesn't work in nT workloads. Ceases to exist once app QoS priority goes up.
LP cores have small die area so cheap, thus many LP cores can be added, thus improves MT perf at low cost, and with good perf/W.
Those cores only exist when you idle.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,847
316
126
Z5LP doesn't work in HMP configs. Or at least not planned to.
Why would it technically not work? Kepler already explained how the OS scheduler would work.
That's a segment above premium ultrathin.
You know, MBP14" isn't a particularly thin or a sleek device anymore. It's a brick.
You're contradicting yourself. You said Zen5LP only in "Premium ultrathin and mainstream". You're counting Halo as above that. Yet Zne5LP will be in Halo.
Isn't that obvious?
No. So can we have a yes/no from you on whether you think there will ever be b.L on AMD DT?

And as mentioned before, you previously ruled out b.L on AMD mobile too, but are now backtracking on that since proven wrong. So think carefully before making absolute statements about DT.
Yes
To what purpose? Add cost to no benefit?
It doesn't work in nT workloads. Ceases to exist once app QoS priority goes up.
Why would it not work on MT workloads? That's the whole purpose of the little cores. Max MT perf, at best perf/W, at lowest die area, at lowest cost of CPU.

E.g. highly scalable workloads such as video transcoding or compiling source code. QoS is moot.
Those cores only exist when you idle.
When idle, nothing is needed, except wakeup logic. Rest is power gated.

So if you say the LP cores only exist when idle, they are waste, and then why waste die space on them?
 
  • Haha
Reactions: spursindonesia

StefanR5R

Elite Member
Dec 10, 2016
5,620
8,085
136
AMD obviously took traces from what they perceived as relevant usages, and from there concluded what to support and how to support it.
PS, will be interesting to see if what happened in the lab predicted well enough what happens later on end customers' devices.

---------

On the mistaken likening of the presumed LP island with (today's) big.LITTLE concept:

My memory may deceive me, but I think the first big.LITTLE implementations (in cell phones) had pairs of big and little cores, and hid those pairs from software too. Only in the next iteration of the concept were all cores exposed to software, and along with that the OS kernel's process scheduler was extended to deal with the hybrid config.

I guess the cost–benefit dynamics with semi-big + semi-little configs, especially if used in a more controlled environment like a smart phone, make parallel use of all cores and OS control of the scheduling the better choice. Whereas with the presumed enormously-huge + teensy-tiny config from AMD in Windows laptops, mutually exclusive use of the core types and hardware control of the scheduling /edit/ might work out better.

---------

When idle, nothing is needed, except wakeup logic. Rest is power gated.
See? These LP cores are part of what you call wakeup logic.

And that's why LP cores (that is, distinctly slower and feature reduced cores) aren't needed in devices which run on mains power all the time.

Why would it not work on MT workloads?
Because they are distinctly slower than the main cores, and feature reduced. They wouldn't help; they would only be in the way if trying to take part.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
3,847
316
126
See? These LP cores are part of what you call wakeup logic.

And that's why LP cores (that is, distinctly slower and feature reduced cores) aren't needed in devices which run on mains power all the time.
Yes, LP cores can be used for that too. But if you only need wakeup logic, no need for even a single full LP core, let alone multiple LP cores. It can be done with much less logic that that.
Because they are distinctly slower than the main cores, and feature reduced. They wouldn't help; they would only be in the way if trying to take part.
That's the whole point of the LP/LITTLE cores.

They are not intended for reaching the same max ST perf as the main/big cores. The big vs LITTLE cores have different purpose. Main/big cores are for max ST perf. LP/LITTLE cores are for max MT perf, max perf/W, small die area, and low cost.
 
Last edited:
  • Haha
Reactions: spursindonesia

StefanR5R

Elite Member
Dec 10, 2016
5,620
8,085
136
(MT performance)
That's the whole point of the LP/LITTLE cores.

They are not intended for reaching the same max ST perf as the main/big cores. The big vs LITTLE cores have different purpose. Main/big cores are for max ST perf. LP/LITTLE cores are for max MT perf, max perf/W, small die area, and low cost.
No. The (rumored) "LP island" is not for that. It is for low perf at low Watts. Nothing else.

AMD "LP" core = low power core; a vehicle to substantially save power in low usage situations

Intel "E" core = in reality, an æ core = area efficient core; a vehicle to cram more cores into a given area, i.e. to have more hardware threads available for high usage situations, while compromising on features and per-core performance and not, in fact, gaining much to speak of in performance/power relative to big cores if the latter are operated at similar V-f points.

Edit, PS:
AMD "c" core (once: "cloud native" core) = dense core; saves die space too but not at the cost of features, only at the cost of possible peak clock; as a side effect also has got somewhat higher throughput/W than the standard core of the same microarchitecture
 
Last edited: