Wow!
AMD slide showing that Zen 3 EPYC will still be 9 dies, probably with 8 core CCXs.
So no 15 chiplets for EPYC 3, as rumoured by SemiAccurate. That rumour could have referred to later generations on the roadmap, though.
But more interesting to me is that the CCX size goes to 8 cores. In this thread, I've speculated about topologies based on the direct-connected 4-core CCX, and I have suggested that the benefit of direct-connection would mean that the 4-core complex would remain the basic building block. My main argument has been that any topology beyond 4 nodes would have to use a suboptimal scheme compared to direct-connection. However, there are topologies for 8 nodes that can retain the optimal connections within two quad-core groups.
Here are some topologies I proposed for connecting 8 nodes (in the Zen 2 speculation thread
here). I drew these way back when we were speculating about the possibility of L4 in the IO chiplet. In this case, the nodes would be the L3 cache slices connected to each core — which I presume is the way they will design it, similar to the 4 slices shared in the current 4-core CCX. Topologies (e-g) retain the optimal connections of cores within each 4-core group.
Figures (a-d) are simple textbook topologies; (a) ring, (b) mesh, (c) cube and (d) twisted cube (see Zapetu's suggestion below). Figure (e-g) are variants of mesh and cube topologies, enhanced with direct connections between the nodes in the upper and lower quads: (e) mesh with two fully connected quads, (f) cube with upper and lower sides fully connected, and (g) same cube topology as (f), but with the lower quad flipped vertically.
@Zapetu may have been spot on in his speculation (in
this post): "Intel has used many different topologies for their multi-socket servers, but in my opinion, this one looks the best for this kind of situation:"
"You can find the above diagram and more information
here. Every processor is 2 hops away from any other processor. By connecting a pair of memory controllers where the IOHs sit, you get quite uniform access to them too. If you are using this topology for the IOD (between L4 slices) then why not use the same topology for the 8-core CCX (between L3 slices)."