Question Zen 6 Speculation Thread

Page 263 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
6,670
10,550
136
(moar cores per socket vs. memory capacity)
Whatever happened to CXL attached RAM? 5th Gen EPYC fully supports the technology and has even partnered with Micron for compatible modules. There's really not a "limit" for total system RAM imposed by the 12-16 channel shoreline limitation, just that you have to pay a smallish price in latency and bandwidth (still massively faster than SSD) for the CXL ram...
CXL memory extenders are CPUs without cores.

Let's undertake a thought experiment: We are a cloud service provider, i.e. our business is to rent out access to virtual machines. We are about to add 102,400 cores to our datacenter. How do we plan to do that? Will we add 400 sockets with 256 cores each, each of them with local memory attached? Or will we rather add 200 sockets with 512 cores and another 200 sockets with 0 cores (all of them with memory attached)?

I imagine that in this business, CXL attached memory will become attractive once it becomes possible at low price and low power consumption to attach memory nodes to more than a single computer. Also, an ability to boot up and shut down these expanders on demand may be desirable.

I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.
 
Jul 27, 2020
27,961
19,106
146
I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.
Not an issue if they really want resources shared. We have an old Dell VRTX where blades are inserted into a "mother" chassis and anything plugged into that chassis can be accessed by any of the blade servers. But it's like Nehalem era.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,508
3,190
136
(moar cores per socket vs. memory capacity)

CXL memory extenders are CPUs without cores.

Let's undertake a thought experiment: We are a cloud service provider, i.e. our business is to rent out access to virtual machines. We are about to add 102,400 cores to our datacenter. How do we plan to do that? Will we add 400 sockets with 256 cores each, each of them with local memory attached? Or will we rather add 200 sockets with 512 cores and another 200 sockets with 0 cores (all of them with memory attached)?

I imagine that in this business, CXL attached memory will become attractive once it becomes possible at low price and low power consumption to attach memory nodes to more than a single computer. Also, an ability to boot up and shut down these expanders on demand may be desirable.

I am probably not entirely up to date, but the CXL memory expanders which I have seen so far were only for local use within one machine, not to be shared between machines.
The spec doesn't prevent vertical cards or daughter boards with the CXL expanders on them. It doesn't have to be an empty CPU package.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,633
5,174
136
Here’s a rule I’ll follow from on, I don’t watch or read the content of anyone who comes on MILD YouTube channel.

Those people are just promoting that grifter. Seriously, I miss Anandtech. :(


People like HUB, Kit, level1tech, are just promoting him and his way of conducting business. You’ve seen that’s guys influence already on these stupid tech sites that publish anything

You seem to have unhealthy obsession about MLID. I think he provides valuable service.
 

OneEng2

Senior member
Sep 19, 2022
835
1,104
106
Speculation, especially with some reason that is explicitly stated by the poster is particularly interesting from my POV. It gets people to think about the art of the possible ;).

Lots of good thoughts here IMO. Memory bandwidth per core. Core arrangements, L3 pool arrangement, Interconnect freedom, etc, etc. All of these make for great speculation.

As for cores, 256 for Venice D. Seems like there isn't much discussion against this.

Venice Classic? 92 vs 128. I am still betting on 128 simply because I can't fathom AMD moving backwards from Turin.

Rumors so far suggest 264 cores, instead of Venice-D's 256, which seems believable to me precisely because the figure is so modest.

In my opinion, that points to a layout change of the dense server CCD, from 2 rows of 16 cores to a 3x11 layout.
Since there's also rumors that the L3 is wandering outside the CCD, that's not so far-fetched.

Architecturally, Zen7 is allegedly the next proper tock, with rumored 15-25% IPC uplift, which would require a substantial of additional logic transistors, probably eating up most or all the power- and transistor budget freed up by the A14 shrink, so a tiny core count bump makes sense in that regard, too.
I could buy that. I completely agree that A14 isn't going to be giving huge transistor density improvements. Therefore, it stands to reason that we won't be seeing significant core count increases either (at least not within a CCD).

Where Zen 7 could significantly raise core counts, is by moving to a larger socket with more memory channels, having higher bandwidth memory (not so hard to believe for 2028->2029 timeframe).

I think it more likely that Zen 7 will increase core counts through more CCD's per socket vs more cores per CCD. As you pointed out, it is likely that the added 10% transistor budget will be used to improve IPC, not to increase core count.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
It's quite simple. It's not 3x11 because it's not anything. No logic was broken in this statement.

For mesh it doesn't make sense but neither does 2x16. For tweaked ringbus which doesn't have actual routing mesh but only one or two clients per ring stop three clients could also be manageable configuration if they change their L3 topology from pure cache to mix that 3rd core into L3 region.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,067
9,810
106
For tweaked ringbus which doesn't have actual routing mesh but only one or two clients per ring stop three clients could also be manageable configuration if they change their L3 topology from pure cache to mix that 3rd core into L3 region.
it is a real(tm) mesh and no, it's one core per one stop for the foreseeable future since any cache perf regression is like, extra haram.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
it is a real(tm) mesh and no, it's one core per one stop for the foreseeable future since any cache perf regression is like, extra haram.

I consider mesh as data routing point. Mesh point can route any packet to any stop with preferred route from routing tables and is so is pretty much freely configurable. Ring bus instead route data only to one direction and stops just harvest data coming to that stop. Because that design difference ring bus can operate at much higher frequencies at moderate power consumption. AMD designs does not have real mesh-configuration, and reason is just that they don't want give away L3 performance. And because that real mesh disadvantage Intel isn't either bringing mesh to their client product but instead will increase core count with same approach - share ring stop with multiple cores.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,067
9,810
106
Mesh point can route any packet to any stop with preferred route from routing tables and is so is pretty much freely configurable
uh, no?
Your average Intel mesh is dimension order routed. With very-very strict rules and I/O caps having double-rows.
AMD designs does not have real mesh-configuration, and reason is just that they don't want give away L3 performance
yeah they do.
It's a mesh, and venice-D one is 4x8.
And because that real mesh disadvantage Intel isn't either bringing mesh to their client product but instead will increase core count with same approach - share ring stop with multiple cores.
well no, it's because Intel interconnects suck.
Like iso CC in power-sensitive env like mobile their ring ain't clocking well anymore. too bad.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
uh, no?
Your average Intel mesh is dimension order routed. With very-very strict rules and I/O caps having double-rows.

Of course they are hardware connected and optimized - but any mesh points still have three paths to solve as mesh point can route data to different paths. Ringbus doesn't need, it brings data to next stop if current stop ain't target. Ladder optimization to ringbus is trivial, instead nowadays implementation of two counterclocking rings upward and downward data paths are separated so ring ends doesn't need to go around but instead ring crossing is done at data send. We'll see if AMD goes to real mesh with Venice - and if they do how much performance they lose.
 

StefanR5R

Elite Member
Dec 10, 2016
6,670
10,550
136
The spec doesn't prevent vertical cards or daughter boards with the CXL expanders on them. It doesn't have to be an empty CPU package.
Form factor wasn't really what I had in mind, but computer topology.
(E.g. quantified by average distance between cores and memory controllers. <- edited)
 
Last edited: