Question AMD's chicken and egg Threadripper problem?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
21,224
14,682
146
My understanding being, the V-cache is keeping the cores fed with data, reducing their idle time and increasing their throughput. More RAM channels would remove the cap on data size and the CPU wouldn't have to deal with data feeding delays due to V-cache running out of space.
 

DrMrLordX

Lifer
Apr 27, 2000
22,220
11,930
136
Shouldn't they just need to substitute the I/O die or refine the one in Arrow Lake?

I know that there might be a lack of pins in the socket for the purpose but maybe they can do something clever with what they have available?

Going quad channel requires redesigning boards, running more traces, using more PCB layers, etc. OEMs would not like it.
 
Jul 27, 2020
21,224
14,682
146
OEMs would not like it.
But they wouldn't have to make too many of those. Let's say they settle on a price point of $550 to $999. Only people going for 16 cores or higher would want to get those boards to remove any bottlenecks and get unbridled performance. Could be a better alternative for people who are not really interested in the extra PCIe lanes of TRs but really want to get the most out of their investment in a high end CPU.
 

MS_AT

Senior member
Jul 15, 2024
378
820
96

Thunder 57

Diamond Member
Aug 19, 2007
3,091
4,904
136
But they wouldn't have to make too many of those. Let's say they settle on a price point of $550 to $999. Only people going for 16 cores or higher would want to get those boards to remove any bottlenecks and get unbridled performance. Could be a better alternative for people who are not really interested in the extra PCIe lanes of TRs but really want to get the most out of their investment in a high end CPU.

Too niche of a market. Also means there would be more SKU's.
 

DrMrLordX

Lifer
Apr 27, 2000
22,220
11,930
136
But they wouldn't have to make too many of those.
Right but they'll still have to do an entirely new trace layout for a quad channel flagship board (or boards, companies like Asus have different tiers of halo products). Such a layout might be different enough from the dual channel designs that they would incur more downtime and expense maintaining fleets of dual and quad channel motherboards for consumer.
 

LightningZ71

Golden Member
Mar 10, 2017
1,949
2,329
136
There's no way to do this in a way that satisfies most everyone at a reasonable price. Anything that supports 64-128 lanes of IO is going to be as expensive as any of the threadrippers boards. The processors themselves are pricey as well.

They can create a middle market with modest investment by using a 16 core Zen 5c CCD, and a high bin 8 core Zen 5 X3D CCD, creating a package for AM5 that allows both to use 2 IF links to a bespoke IOD that has a 256MB 3d cache stacked on it with enough bandwidth to feed both CCDs at full rate simultaneously while also aggressively prefetching from DRAM. Yeah, it's not gonna be cheap to develop, but it will be impressive. As for the motherboard, they just need to start with an existing design, but redo the PCIe slots by using a PLX chip to split the 16 PCIe 5 lanes into 64 PCIe 4 lanes (2x oversubscribed) with an additional X4 and an x1 slot off the chip set.

It would give HEDT IO capabilities without requiring a high layer threadripper board. They would charge so much for it that no one would buy it though.
 

QuickyDuck

Junior Member
Nov 6, 2023
21
23
41
I bet AMD is quite satisfied with its current lineup, clear segmentation between consumer and pro platform.
 

StefanR5R

Elite Member
Dec 10, 2016
6,096
9,172
136
Could you name workloads that you find bandwidth starved? How many of them you would consider typical client workloads?
I think the suggestion of adding more memory channels was made in the context of the suggestion of adding more cores.

BTW, at my day job, of all the workloads which I run locally, there is just about a single one in which computing performance matters a lot. (Except for this one, all other of the local workloads are speedy enough that they remain interactive workloads.) It is a data intensive workload as it involves setting up large systems of linear equations, solving them, and doing a lot of postprocessing on their results. The solver is about the only part of it which the software vendor performance-optimized very well. On my 6c/12t Intel CPU (in a generic office PC)¹, the solver spawns 6 threads, most likely because HT would do nothing but decrease performance in this part. The postprocessing is largely single-threaded even though most of it is an embarrassingly parallel problem per se. My guess is that the software vendor never bothered to add the (limited) complexity of performance optimization by parallelizing this because spending development & support cost on more customer-facing features is far more important to them. But if they ever would parallelize this postprocessing, then the CPU would definitely need every amount of memory R/W bandwidth it could get. (Plus, it would depend on access latency not tanking while bandwidth is used up to the maximum.)

<captain_obvious> Client workloads are diverse. If you have a local workload which transforms large datasets, then it would rather likely benefit from increased memory access bandwidth. (Or from turning it into a remote workload.) </captain_obvious>

I am imagining some folks about to be responding to this: But, but, that's not "typical client workload". To this I have to say: Oh yes, in my field and related ones, it is a typical client workload. It has been ever since the old days when microcomputers crept out of the confines of the datacenter into the office. And there is no sign yet that this will change.

________
¹) So why did my employer give me a generic PC, not a workstation, to perhaps increase my productivity? Easy: I am expected to organize my workflows such that this non-interactive computing is done in parallel with other of my daily work.

Years back at a different employer, I did replace my generic PC first with an Ivy Bridge-E PC, then with a Broadwell-E PC on my own initiative and expense. But at that place and time, (a) corporate IT was less strict, (b) I had lesser opportunity to turn to other tasks concurrently to the computation.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,096
9,172
136
I bet AMD is quite satisfied with its current lineup, clear segmentation between consumer and pro platform.
HEDT and workstation … are the same from the perspective of what prospective customers are asking for. They are just two different answers by the CPU maker and his partners to this question.¹

Threadripper started as HEDT only. Then AMD switched Threadripper into workstation almost-only². I understand why they made the change; I only find it a bit strange that they kept the name.

________
¹) An this is, among else, because the CPU maker and his partners have a counter-question to prospective customers: How much money are you ready to part with in order to receive more or less committed assurances that your application is going to run well on it?
²) They made the differences between the Ryzen Threadripper ?000X and Ryzen Threadripper PRO ?000WX lines rather subtle. As far as Ryzen Threadripper ?000X is concerned, it is arguable if the segmentation between consumer and pro platform is a clear one. Is it attractive to consumers? Hardly, in comparison to Ryzen. Attractive to professionals? Hardly, in comparison to Threadripper PRO WX.


Threadripper could be a LOT more popular if AMD simply released it in quad channel form. But they don't want to because it is supposedly expensive. BUT how do they bring their cost down if they can't sell it in volume which requires them lowering the price of entry?

Chicken and egg. Something's gotta give.

Come on, AMD. Be bold. Invest in the R&D for a quad channel Threadripper with minimum $400 motherboards and then see your Threadripper sales soar in a year or two!
But Ryzen Threadripper 7000X/ "chipset" TRX50 _is_ quad channel only. But indeed, it is not a well cost-optimized platform, and truly not a price-optimized platform.³

________
³) It is a least-effort derivative of the Ryzen Threadripper PRO 7000WX/ "chipset" TRX50|WRX90 (WRX90 signifying channel count not crippled to 4 but just cut to 8), which in turn of course is a lowish-effort derivative of the EPYC 9004/ socket SP5 12 channel platform. Lowish effort, not least effort,⁴ because they did at least make the socket smaller. Though mechanically it is just TR4/SP3/SP6 reused.
⁴) Edit, AMD even implemented HDA in the sIOD starting with Zen 4 (if not earlier), extra for Threadripper PRO and Threadripper.
 
Last edited:
Jul 27, 2020
21,224
14,682
146
Can you prove they are BW and not latency limited?:)

If Strix Halo doesn't have V-cache and the final Strix Halo silicon beats 9950X in some benchmarks and assuming it has nothing special outside of being vanilla Zen 5 arch, that would be proof that the cores are bandwidth starved.
 

MS_AT

Senior member
Jul 15, 2024
378
820
96
My understanding being, the V-cache is keeping the cores fed with data, reducing their idle time and increasing their throughput.
Don't forget, that V-cache also keeps the instructions. It will help cut front-end latency. And C&C noted in their reviewes that front-end latency is one of the biggest problems of Zen5.
Can't afford both of them and test, in terms of time and money.
It would be easiest with hw on hand, but should be doable by carefully pooling data from different reviews. Still time consuming.

I think the suggestion of adding more memory channels was made in the context of the suggestion of adding more cores.
The context was that currently Zen 5 has insufficient BW to shine. While I agree that current BW is insufficient for some workloads (I really could use more 2 mem channels on my setup) I doubt that the general public would see the needle move in things like Geekbench. The general perception of the platform in public opinion wouldn't change, but a specific group(s) of people would appreciate that.

I am imagining some folks about to be responding to this: But, but, that's not "typical client workload". To this I have to say: Oh yes, in my field and related ones, it is a typical client workload.
No one is denying that there are workloads that could use the BW, but the question is if this field is big enough to warrant a new platform.

If Strix Halo doesn't have V-cache and the final Strix Halo silicon beats 9950X in some benchmarks and assuming it has nothing special outside of being vanilla Zen 5 arch, that would be proof that the cores are bandwidth starved.
Yes comparing results between Halo and 9950X should reveal BW bottlenecked workloads. Halo should lead in those. For example Halo should be at least 2x-3x faster in LLM inference run on CPU, thanks to having the BW advantage.
 

StefanR5R

Elite Member
Dec 10, 2016
6,096
9,172
136
The context was that currently Zen 5 has insufficient BW to shine.
#1, #9… :-)

(But as I claimed myself, it's not so much about particular core counts but about local computing on large datasets.)

No one is denying that there are workloads that could use the BW, but the question is if this field is big enough to warrant a new platform.
If the price is right and the rest of the features are OK, it could likely sell at volume. But it wouldn't be bought by a new, as yet undiscovered audience, but by one which currently makes do with other products. And a good price implies it wouldn't be much of an upsell.
 

eek2121

Diamond Member
Aug 2, 2005
3,146
4,504
136
I doubt we will ever see quad channel on consumer. When AMD revamps IOD for Zen 6 they will probably focus on getting the 1:1 speed higher. So if you can run DDR5 8000-10000 at those speeds will help alot with bandwidth.
I suspect they will lean harder on caching instead.
Going quad channel requires redesigning boards, running more traces, using more PCB layers, etc. OEMs would not like it.
I do not believe this is true. Current boards are already wired for 4 DIMMs except in some cases. 4 DIMM boards don’t clock as high due to the IOD.

The only real added cost should be a new IO die, chipset, and software support.

EDIT: I should add that I have seen quad DIMM boards click as high as DDR5-6400 in some cases.
 

DrMrLordX

Lifer
Apr 27, 2000
22,220
11,930
136
I suspect they will lean harder on caching instead.

I do not believe this is true. Current boards are already wired for 4 DIMMs except in some cases.

That isn't necessarily going to be the same as quad channel, though. The top-end Threadripper boards are 14-layer PCBs!
 
  • Like
Reactions: Shmee