Question Zen 6 Speculation Thread

basix · 2025-10-04T08:06:28-0400

Well, in the end it will be something and not "not anything"

lucasworais · 2025-10-04T08:24:23-0400

Jeez, what these speculations threads are turning into

okoroezenwa · 2025-10-04T09:22:52-0400

lucasworais said:
Jeez, what these speculations threads are turning into

Pedantry Inc.

StefanR5R · 2025-10-04T12:45:29-0400

(moar cores per socket vs. memory capacity)

LightningZ71 said:
Whatever happened to CXL attached RAM? 5th Gen EPYC fully supports the technology and has even partnered with Micron for compatible modules. There's really not a "limit" for total system RAM imposed by the 12-16 channel shoreline limitation, just that you have to pay a smallish price in latency and bandwidth (still massively faster than SSD) for the CXL ram...

CXL memory extenders are CPUs without cores.

Let's undertake a thought experiment: We are a cloud service provider, i.e. our business is to rent out access to virtual machines. We are about to add 102,400 cores to our datacenter. How do we plan to do that? Will we add 400 sockets with 256 cores each, each of them with local memory attached? Or will we rather add 200 sockets with 512 cores and another 200 sockets with 0 cores (all of them with memory attached)?

I imagine that in this business, CXL attached memory will become attractive once it becomes possible at low price and low power consumption to attach memory nodes to more than a single computer. Also, an ability to boot up and shut down these expanders on demand may be desirable.

I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.

igor_kavinski · 2025-10-04T12:54:00-0400

StefanR5R said:
I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.

Not an issue if they really want resources shared. We have an old Dell VRTX where blades are inserted into a "mother" chassis and anything plugged into that chassis can be accessed by any of the blade servers. But it's like Nehalem era.

adroc_thurston · 2025-10-04T14:12:42-0400

StefanR5R said:
I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.

Because memory pooling is a 2.0 thing (3.0, really, 2.0 mempools never quite worked right).

LightningZ71 · 2025-10-04T14:35:18-0400

StefanR5R said:
(moar cores per socket vs. memory capacity)

CXL memory extenders are CPUs without cores.

Let's undertake a thought experiment: We are a cloud service provider, i.e. our business is to rent out access to virtual machines. We are about to add 102,400 cores to our datacenter. How do we plan to do that? Will we add 400 sockets with 256 cores each, each of them with local memory attached? Or will we rather add 200 sockets with 512 cores and another 200 sockets with 0 cores (all of them with memory attached)?

I imagine that in this business, CXL attached memory will become attractive once it becomes possible at low price and low power consumption to attach memory nodes to more than a single computer. Also, an ability to boot up and shut down these expanders on demand may be desirable.

I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.

The spec doesn't prevent vertical cards or daughter boards with the CXL expanders on them. It doesn't have to be an empty CPU package.

Joe NYC · 2025-10-04T18:01:21-0400

poke01 said:
Here’s a rule I’ll follow from on, I don’t watch or read the content of anyone who comes on MILD YouTube channel.

Those people are just promoting that grifter. Seriously, I miss Anandtech.

People like HUB, Kit, level1tech, are just promoting him and his way of conducting business. You’ve seen that’s guys influence already on these stupid tech sites that publish anything

You seem to have unhealthy obsession about MLID. I think he provides valuable service.

poke01 · 2025-10-04T19:31:48-0400

Joe NYC said:
You seem to have unhealthy obsession about MLID. I think he provides valuable service.

Unhealthy? I don’t even speak about him or his channel that regularly.

OneEng2 · 2025-10-04T23:53:39-0400

Speculation, especially with some reason that is explicitly stated by the poster is particularly interesting from my POV. It gets people to think about the art of the possible

.

Lots of good thoughts here IMO. Memory bandwidth per core. Core arrangements, L3 pool arrangement, Interconnect freedom, etc, etc. All of these make for great speculation.

As for cores, 256 for Venice D. Seems like there isn't much discussion against this.

Venice Classic? 92 vs 128. I am still betting on 128 simply because I can't fathom AMD moving backwards from Turin.

reaperrr3 said:
Rumors so far suggest 264 cores, instead of Venice-D's 256, which seems believable to me precisely because the figure is so modest.

In my opinion, that points to a layout change of the dense server CCD, from 2 rows of 16 cores to a 3x11 layout.
Since there's also rumors that the L3 is wandering outside the CCD, that's not so far-fetched.

Architecturally, Zen7 is allegedly the next proper tock, with rumored 15-25% IPC uplift, which would require a substantial of additional logic transistors, probably eating up most or all the power- and transistor budget freed up by the A14 shrink, so a tiny core count bump makes sense in that regard, too.

I could buy that. I completely agree that A14 isn't going to be giving huge transistor density improvements. Therefore, it stands to reason that we won't be seeing significant core count increases either (at least not within a CCD).

Where Zen 7 could significantly raise core counts, is by moving to a larger socket with more memory channels, having higher bandwidth memory (not so hard to believe for 2028->2029 timeframe).

I think it more likely that Zen 7 will increase core counts through more CCD's per socket vs more cores per CCD. As you pointed out, it is likely that the added 10% transistor budget will be used to improve IPC, not to increase core count.

adroc_thurston · 2025-10-04T23:57:02-0400

OneEng2 said:
Venice Classic? 92 vs 128.

there is no discussion, SP8 tops out a 96c.

OneEng2 said:
I am still betting on 128 simply because I can't fathom AMD moving backwards from Turin.

They didn't move 'backwards'.
Classic big socket met Nothingness.

soresu · 2025-10-05T00:50:25-0400

adroc_thurston said:
Classic big socket met Nothingness.

What did Nothingness do? 😅

naukkis · 2025-10-05T03:04:55-0400

inquiss said:
It's quite simple. It's not 3x11 because it's not anything. No logic was broken in this statement.

For mesh it doesn't make sense but neither does 2x16. For tweaked ringbus which doesn't have actual routing mesh but only one or two clients per ring stop three clients could also be manageable configuration if they change their L3 topology from pure cache to mix that 3rd core into L3 region.

adroc_thurston · 2025-10-05T03:09:17-0400

naukkis said:
For tweaked ringbus which doesn't have actual routing mesh but only one or two clients per ring stop three clients could also be manageable configuration if they change their L3 topology from pure cache to mix that 3rd core into L3 region.

it is a real(tm) mesh and no, it's one core per one stop for the foreseeable future since any cache perf regression is like, extra haram.

naukkis · 2025-10-05T03:36:39-0400

adroc_thurston said:
it is a real(tm) mesh and no, it's one core per one stop for the foreseeable future since any cache perf regression is like, extra haram.

I consider mesh as data routing point. Mesh point can route any packet to any stop with preferred route from routing tables and is so is pretty much freely configurable. Ring bus instead route data only to one direction and stops just harvest data coming to that stop. Because that design difference ring bus can operate at much higher frequencies at moderate power consumption. AMD designs does not have real mesh-configuration, and reason is just that they don't want give away L3 performance. And because that real mesh disadvantage Intel isn't either bringing mesh to their client product but instead will increase core count with same approach - share ring stop with multiple cores.

adroc_thurston · 2025-10-05T03:51:48-0400

naukkis said:
Mesh point can route any packet to any stop with preferred route from routing tables and is so is pretty much freely configurable

uh, no?
Your average Intel mesh is dimension order routed. With very-very strict rules and I/O caps having double-rows.

naukkis said:
AMD designs does not have real mesh-configuration, and reason is just that they don't want give away L3 performance

yeah they do.
It's a mesh, and venice-D one is 4x8.

naukkis said:
And because that real mesh disadvantage Intel isn't either bringing mesh to their client product but instead will increase core count with same approach - share ring stop with multiple cores.

well no, it's because Intel interconnects suck.
Like iso CC in power-sensitive env like mobile their ring ain't clocking well anymore. too bad.

naukkis · 2025-10-05T04:51:38-0400

adroc_thurston said:
uh, no?
Your average Intel mesh is dimension order routed. With very-very strict rules and I/O caps having double-rows.

Of course they are hardware connected and optimized - but any mesh points still have three paths to solve as mesh point can route data to different paths. Ringbus doesn't need, it brings data to next stop if current stop ain't target. Ladder optimization to ringbus is trivial, instead nowadays implementation of two counterclocking rings upward and downward data paths are separated so ring ends doesn't need to go around but instead ring crossing is done at data send. We'll see if AMD goes to real mesh with Venice - and if they do how much performance they lose.

StefanR5R · 2025-10-05T06:43:56-0400

LightningZ71 said:
The spec doesn't prevent vertical cards or daughter boards with the CXL expanders on them. It doesn't have to be an empty CPU package.

Form factor wasn't really what I had in mind, but computer topology.
(E.g. quantified by average distance between cores and memory controllers. <- edited)

basix · 2025-10-05T08:18:09-0400

adroc_thurston said:
It's a mesh, and venice-D one is 4x8.

So we are talking four rows of 8C or eight rows with 4C (latter probably looks like 4x 8C CCDs of today stitched together)?

Question Zen 6 Speculation Thread

Senior member

Member

Member

Elite Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Elite Member

Senior member

Attachments