How many cores per CCX in 7nm Zen 2?

  • 4 cores per CCX (3 or more CCXs per die)

    Votes: 55 45.1%
  • 6 cores per CCX (2 or more CCXs per die)

    Votes: 44 36.1%
  • 8 cores per CCX (1 or more CCXs per die)

    Votes: 23 18.9%

  • Total voters
    122

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Hi all, it has been some exciting months in the CPU space since the Ryzen launch, with more to come before year end. However, I am already very curious about how AMD will evolve the Zen core with the upcoming 7nm Zen 2 next year — in particular how the CCX will be configured.

There is a similar topic and poll over at SemiAccurate, in which there are more votes for 6 cores per CCX. However, here I will argue for staying with 4 cores per CCX, on the grounds of interconnect topology.

First, 6 cores per CCX seems so un-zen-like. In the current generation Zen, the 4 cores in the CCX are directly connected (c = 6). Directly connecting 6 cores seems infeasible, as the number of links grows quadratically (c = n*(n-1)/2), thus requiring a suboptimal topology (e.g. ring or mesh). Also, 4 cores per CCX is a nice partition size (e.g. for virtualization), with one memory controller per CCX.

Currently, using the new Infinity Fabric, AMD arranges the Zen cores in direct-connected clusters (CCX) on direct-connected dies within the CPU package. The CPUs on a 2-socket PCB are also direct-connected. Notably, the die-to-die connection across CPUs is not done using switching. Instead, a die in CPU 0 is direct-connected to the corresponding die in CPU 1, forming a sparsely connected hyper-cube (see "The Heart Of AMD’s EPYC Comeback Is Infinity Fabric").

Taking this hierarchical direct-connect approach to the extreme, it leads to the following "quad-tree" topology with room for up to 4 CCXs per die (drawn here for two sockets — although it can naturally scale to 4 direct-connected sockets).
9114301_d9529b53bcf49e738f65058e1657f6b4.png

Note that the links between clusters are fat links, i.e. consist of a set of links between corresponding cores in each cluster. No switches are involved. E.g. between CCXs, core 0 in CCX 0 is direct-connected to core 0 in CCX 1. Similarly, the link between CPUs consists of the set of links between corresponding cores in each CPU. This arrangement means that there is max log4(n) number of hops between any core in a n-core system. For example, to go from core 0-0-0-0 (core 0 in CCX 0 in die 0 in CPU 0) to core 1-2-3-1, you first make a hop within the CCX to core 1-0-0-0, then cross-CCX to core 1-2-0-0, then cross-die to core 1-2-3-0, and finally, cross-socket to core 1-2-3-1.

The following table shows the required number of links (point-to-point connections).
9114301_468a37d7fde7c4331af1387f750def94.png

For this 2-socket configuration, note that 10 ports are needed per core (a 4-socket configuration would require 12 ports per core). Is this feasible? If not, a lower number of ports and wires can be achieved by introducing switches at each hierarchical level (i.e. multiplex the logical point-to-point connections over fewer wires). This then becomes a fat tree topology (e.g. used in supercomputers).
9114301_ffbcdd71646f3c83851f03f78edebb10.png

I am, by no means, an expert on interconnects, so if any of you have any background, I would love to hear your views on this, and how Infinity Fabric is likely to be used in the future to scale up systems.

PS. But three CCXs for a 12 core die make an awkward die layout, right? Well, GlobalFoundries' 7nm process (7LP) has "more than twice" the density at "30% lower die cost". Why not go for 3+1 — three CCXs and one GCX (GPU Complex), all fully connected. Then, as with the current approach for EPYC, put 4 of these dies together on an MCM, also fully connected, and you have the rumoured 48 CPU cores for EPYC 2, plus 4 times whatever number of graphics compute cores in a GCX. This configuration also chimes with the "HSA Performance Delivered" point on AMD's old roadmap (under "High-Performance Server APU").
 
Last edited:
  • Like
Reactions: Schmide

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
The most likely option is 6 core CCX. The primary advantage would be that a Ryzen APU could incorporate a single 6 core CCX and avoid the cross CCX communication penalty. Raven Ridge APUs have a single 4 core CCX along with 11 Vega NCU and I expect that AMD will want to keep a single 6 core CCX for their first gen 7nm Zen 2 APUs along with Vega or Navi graphics .
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
I still think it will be 4 core ccx's. Not higher than 4 core dice and still 2 of them like it is now.
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
If they use 4 core CCX like now, what are they going to do with all that extra space? I doubt they will sell the same as what we have now at smaller die area, lower power and cheaper... They have to improve somewhere other than die shrink, why not up core count & increase AVX with other improvements?
 

Schmide

Diamond Member
Mar 7, 2002
5,581
712
126
Since everything is just modular interconnects. They could just rotate some things.

zen2pos.jpg
 
  • Like
Reactions: Vattila

tamz_msc

Diamond Member
Jan 5, 2017
3,708
3,554
136
I think that AMD will keep the 4-cores-per-CCX design, with the die shrink allowing for the beefing up of the IF(like having its own clock domain) and additional wide-vector ALUs as per requirement.

Seems like the most straightforward way to go, as a 6-core CCX will not be the best option for inter-core communication like you said.
 
  • Like
Reactions: Drazick

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,069
136
People have to say how they expect amd to hot 48 core a socket. 6 core ccx makes most sense. L3 cache within a CCX already isn't uniform so moving to 6 core for the average case probably isn't increasing latency at all.

More then 4 dies makes using an organic interposer almost impossible. The other option is 3 ccx a die. I think 6 core ccx has the best overall fit for all markets ( console, laptop, HEDT, server)
 
  • Like
Reactions: Ajay

dacostafilipe

Senior member
Oct 10, 2013
771
244
116
We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

Improving frequency/IPC over time is what AMD normally does ( Bulldozer -> Piledriver, Bobcat -> Jaguar ) and specially with Zen this makes a lot of sense.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
People have to say how they expect amd to [hit] 48 core a socket [...] I think 6 core ccx has the best overall fit for all markets (console, laptop, HEDT, server)

Did you consider my 3+1 configuration (3 CCXs + 1 GCX)? 4 of these APU dies will hit 48-core for Epyc 2, with 4 GCXs for graphics and parallel compute (as an alternative to AVX-512). A two-die APU replaces current 16-core ThreadRipper with 24 cores and 2 GCXs. A single-die APU replaces 8-core Ryzen with 12 cores and 1 GCX, hence will offer integrated graphics in the mainstream desktop market (especially important in enterprise), and can also be used for high-end mobile (for 4, 6, 9 or 12 cores, by salvaging defects). A separate die with 1 CCX + 1 GCX, like Raven Ridge now, can address 4-core low-end, low-power mobile.
 
  • Like
Reactions: Space Tyrant

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
The cores in a single CCX is not connected using IF, IF is used to connect multiple CCXs
You are right, my mistake. But these links would represent access to L3 chache, right? So it would still be better to stay at 4 cores?
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,618
136
As others already mentioned this doesn't apply for cores in a CCX. But it works well as an argument for the 48c Epyc containing 4 dies (so either 2x 6c CCX or 3x 4c CCX) instead 6 dies (with the current 2x 4c CCX). The latter is simply unworkable and would also be incompatible with Epyc's dual sockets systems that currently link the four dies between each chip.
 

SPBHM

Diamond Member
Sep 12, 2012
5,056
409
126
6 core CCX would make more sense for the desktop CPUs higher than R3, and better to compete with Coffee Lake;
it's probably not the case for R3 and APUs.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
But these links would represent [core interconnect], right? So it would still be better to stay at 4 cores?

Yup. The 4 cores in the current CCX are fully connected, like in your drawing — as confirmed by this slide. A 6-core CCX would have a sub-optimal interconnect topology. This is the main point of my original post, and the main argument for a 4-core CCX. A 4-core CCX is also well balanced with memory bandwidth (1 memory controller per CCX). In short, zen.

amd-ccx-epyc-cpu-presentation-slides-1.jpg


PS. A 6-core CCX would also require fatter IF links between CCXs, to carry the increased bandwidth from the larger number of cores per CCX.
 
Last edited:
  • Like
Reactions: Drazick

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
It seems to me its very simple.
Amd have said they keep existing mb platforms. For 48 cores on epyc they need to upgrade within the ccx for that to work right?
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,567
136
I still don't understand, why can't they just put 3 CXX'is on a chip. Mapping them to memory should be way easier than interconnecting 6 cores within a CXX
 

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
People have to say how they expect amd to hot 48 core a socket. 6 core ccx makes most sense. L3 cache within a CCX already isn't uniform so moving to 6 core for the average case probably isn't increasing latency at all.

More then 4 dies makes using an organic interposer almost impossible. The other option is 3 ccx a die. I think 6 core ccx has the best overall fit for all markets ( console, laptop, HEDT, server)

I agree, assuming AMD can afford the extra effort (time & manpower). Inserting 2 more cores into the middle of the current floorplan would seem to be the best choice for reorganizing the L3$ interconnects withing a CCX. Basically leading to 3 horizontal 'slices' in a plan view like below.

HC28.AMD.Mike%20Clark.final-page-014.jpg


Some have posted some images suggesting that the 4 core CCX will continue on in Zen2 - if so, it's probably due to a resource deficit or TTM issue.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,618
136
Some have posted some images suggesting that the 4 core CCX will continue on in Zen2 - if so, it's probably due to a resource deficit or TTM issue.
As of Raven Ridge AMD will have just two different Zen dies, the known Zeppelin (2x CCX) die used from low end Ryzen up to Epyc, and the upcoming 1x CCX die in Ryzen Mobile and APUs. If a 6c CCX is indeed in the cards AMD may well want to further optimized the mobile APUs for power consumption on 7LP and keep it a 4c CCX there.
 
  • Like
Reactions: Ajay

Tuna-Fish

Golden Member
Mar 4, 2011
1,324
1,462
136
We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

AMD has advertised to server customers that the Zen 2 Epycs that come with the GloFo 7nm will be compatible with the current motherboards and that they will have 48 cores per package.

We know that they are going up to 48 cores per CPU, we just don't know the topology. We also know that they intend to still have 8 memory channels, which implies 12 cores per chip.

You are right, my mistake. But these links would represent access to L3 chache, right? So it would still be better to stay at 4 cores?

It's not as clear cut. The cores are not attached to each other with links that go both ways, instead all 4 cores have their own link to each 4 L3 slices, all intercore communication goes through the L3. This means that there are currently 16 links. If they keep the system of having a L3 block for each core, going up to 6 cores would mean having 36 links.

However, nothing in the system requires that the amount of L3 blocks is the same as the amount of the cores, and indeed there are technical reasons why you'd want to have a power-of-two amount of L3 blocks (as they are accessed by low-order address bit interleave, p-o-t amount of blocks can be done by a very simple system, but n-p-o-t would require a divider for address calculation.)

If instead there are 6 cores but only 4 L3 blocks, now only 24 links are required.
 

nix_zero

Junior Member
Mar 19, 2017
12
5
81
We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

Improving frequency/IPC over time is what AMD normally does ( Bulldozer -> Piledriver, Bobcat -> Jaguar ) and specially with Zen this makes a lot of sense.
if so, starship then how would happen?
 
May 11, 2008
19,299
1,129
126
I think the consumer and possible console configuration will remain 4 core ccx for some time to come until 8 cores is a bare minimum in the desktop pc world.
Going from jaguar to zen ccx makes sense for consumer and console cpu's. It is small , cheap and powerful enough for such tasks.
I predict that the professional high end server versions (and some HEDT) will be 8 core CCX in the upcoming years to come.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
If they use 4 core CCX like now, what are they going to do with all that extra space? I doubt they will sell the same as what we have now at smaller die area, lower power and cheaper... They have to improve somewhere other than die shrink, why not up core count & increase AVX with other improvements?

If AMD is serious about releasing this in the first half of 2019, yields are going to suck. It pretty much has to be a smaller die than Summit Ridge.

Edit: I'm actually warming up to 4x3. I think it would make it easier to make a smaller 4x2 and 4x1 model if they want. What would be even better if (as mentioned) they could have a Navi die connected by IF.