Speculation: The CCX in Zen 2

Vattila · Aug 8, 2017

Hi all, it has been some exciting months in the CPU space since the Ryzen launch, with more to come before year end. However, I am already very curious about how AMD will evolve the Zen core with the upcoming 7nm Zen 2 next year — in particular how the CCX will be configured.

There is a similar topic and poll over at SemiAccurate, in which there are more votes for 6 cores per CCX. However, here I will argue for staying with 4 cores per CCX, on the grounds of interconnect topology.

First, 6 cores per CCX seems so un-zen-like. In the current generation Zen, the 4 cores in the CCX are directly connected (c = 6). Directly connecting 6 cores seems infeasible, as the number of links grows quadratically (c = n*(n-1)/2), thus requiring a suboptimal topology (e.g. ring or mesh). Also, 4 cores per CCX is a nice partition size (e.g. for virtualization), with one memory controller per CCX.

Currently, using the new Infinity Fabric, AMD arranges the Zen cores in direct-connected clusters (CCX) on direct-connected dies within the CPU package. The CPUs on a 2-socket PCB are also direct-connected. Notably, the die-to-die connection across CPUs is not done using switching. Instead, a die in CPU 0 is direct-connected to the corresponding die in CPU 1, forming a sparsely connected hyper-cube (see "The Heart Of AMD’s EPYC Comeback Is Infinity Fabric").

Taking this hierarchical direct-connect approach to the extreme, it leads to the following "quad-tree" topology with room for up to 4 CCXs per die (drawn here for two sockets — although it can naturally scale to 4 direct-connected sockets).

Note that the links between clusters are fat links, i.e. consist of a set of links between corresponding cores in each cluster. No switches are involved. E.g. between CCXs, core 0 in CCX 0 is direct-connected to core 0 in CCX 1. Similarly, the link between CPUs consists of the set of links between corresponding cores in each CPU. This arrangement means that there is max log4(n) number of hops between any core in a n-core system. For example, to go from core 0-0-0-0 (core 0 in CCX 0 in die 0 in CPU 0) to core 1-2-3-1, you first make a hop within the CCX to core 1-0-0-0, then cross-CCX to core 1-2-0-0, then cross-die to core 1-2-3-0, and finally, cross-socket to core 1-2-3-1.

The following table shows the required number of links (point-to-point connections).

For this 2-socket configuration, note that 10 ports are needed per core (a 4-socket configuration would require 12 ports per core). Is this feasible? If not, a lower number of ports and wires can be achieved by introducing switches at each hierarchical level (i.e. multiplex the logical point-to-point connections over fewer wires). This then becomes a fat tree topology (e.g. used in supercomputers).

I am, by no means, an expert on interconnects, so if any of you have any background, I would love to hear your views on this, and how Infinity Fabric is likely to be used in the future to scale up systems.

PS. But three CCXs for a 12 core die make an awkward die layout, right? Well, GlobalFoundries' 7nm process (7LP) has "more than twice" the density at "30% lower die cost". Why not go for 3+1 — three CCXs and one GCX (GPU Complex), all fully connected. Then, as with the current approach for EPYC, put 4 of these dies together on an MCM, also fully connected, and you have the rumoured 48 CPU cores for EPYC 2, plus 4 times whatever number of graphics compute cores in a GCX. This configuration also chimes with the "HSA Performance Delivered" point on AMD's old roadmap (under "High-Performance Server APU").

Arachnotronic · Aug 8, 2017

The answer to this question was revealed some time ago in a SemiAccurate article.

raghu78 · Aug 8, 2017

The most likely option is 6 core CCX. The primary advantage would be that a Ryzen APU could incorporate a single 6 core CCX and avoid the cross CCX communication penalty. Raven Ridge APUs have a single 4 core CCX along with 11 Vega NCU and I expect that AMD will want to keep a single 6 core CCX for their first gen 7nm Zen 2 APUs along with Vega or Navi graphics .

formulav8 · Aug 8, 2017

I still think it will be 4 core ccx's. Not higher than 4 core dice and still 2 of them like it is now.

Doom2pro · Aug 9, 2017

If they use 4 core CCX like now, what are they going to do with all that extra space? I doubt they will sell the same as what we have now at smaller die area, lower power and cheaper... They have to improve somewhere other than die shrink, why not up core count & increase AVX with other improvements?

Schmide · Aug 9, 2017

Since everything is just modular interconnects. They could just rotate some things.

tamz_msc · Aug 9, 2017

I think that AMD will keep the 4-cores-per-CCX design, with the die shrink allowing for the beefing up of the IF(like having its own clock domain) and additional wide-vector ALUs as per requirement.

Seems like the most straightforward way to go, as a 6-core CCX will not be the best option for inter-core communication like you said.

itsmydamnation · Aug 9, 2017

People have to say how they expect amd to hot 48 core a socket. 6 core ccx makes most sense. L3 cache within a CCX already isn't uniform so moving to 6 core for the average case probably isn't increasing latency at all.

More then 4 dies makes using an organic interposer almost impossible. The other option is 3 ccx a die. I think 6 core ccx has the best overall fit for all markets ( console, laptop, HEDT, server)

SpaceBeer · Aug 9, 2017

I think they'll keep 4 cores in CCX

2017-08-09_11_30_17-unnamed0.graphml_-_y_Ed.png

dacostafilipe · Aug 9, 2017

We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

Improving frequency/IPC over time is what AMD normally does ( Bulldozer -> Piledriver, Bobcat -> Jaguar ) and specially with Zen this makes a lot of sense.

TimCh · Aug 9, 2017

SpaceBeer said:
I think they'll keep 4 cores in CCX

The cores in a single CCX is not connected using IF, IF is used to connect multiple CCXs

Vattila · Aug 9, 2017

itsmydamnation said:
People have to say how they expect amd to [hit] 48 core a socket [...] I think 6 core ccx has the best overall fit for all markets (console, laptop, HEDT, server)

Did you consider my 3+1 configuration (3 CCXs + 1 GCX)? 4 of these APU dies will hit 48-core for Epyc 2, with 4 GCXs for graphics and parallel compute (as an alternative to AVX-512). A two-die APU replaces current 16-core ThreadRipper with 24 cores and 2 GCXs. A single-die APU replaces 8-core Ryzen with 12 cores and 1 GCX, hence will offer integrated graphics in the mainstream desktop market (especially important in enterprise), and can also be used for high-end mobile (for 4, 6, 9 or 12 cores, by salvaging defects). A separate die with 1 CCX + 1 GCX, like Raven Ridge now, can address 4-core low-end, low-power mobile.

SpaceBeer · Aug 9, 2017

TimCh said:
The cores in a single CCX is not connected using IF, IF is used to connect multiple CCXs

You are right, my mistake. But these links would represent access to L3 chache, right? So it would still be better to stay at 4 cores?

moinmoin · Aug 9, 2017

SpaceBeer said:

As others already mentioned this doesn't apply for cores in a CCX. But it works well as an argument for the 48c Epyc containing 4 dies (so either 2x 6c CCX or 3x 4c CCX) instead 6 dies (with the current 2x 4c CCX). The latter is simply unworkable and would also be incompatible with Epyc's dual sockets systems that currently link the four dies between each chip.

Vattila · Aug 9, 2017

By the way, I missed these threads which also cover this topic in part:

SPBHM · Aug 9, 2017

6 core CCX would make more sense for the desktop CPUs higher than R3, and better to compete with Coffee Lake;
it's probably not the case for R3 and APUs.

Vattila · Aug 9, 2017

SpaceBeer said:
But these links would represent [core interconnect], right? So it would still be better to stay at 4 cores?

Yup. The 4 cores in the current CCX are fully connected, like in your drawing — as confirmed by this slide. A 6-core CCX would have a sub-optimal interconnect topology. This is the main point of my original post, and the main argument for a 4-core CCX. A 4-core CCX is also well balanced with memory bandwidth (1 memory controller per CCX). In short, zen.

PS. A 6-core CCX would also require fatter IF links between CCXs, to carry the increased bandwidth from the larger number of cores per CCX.

krumme · Aug 9, 2017

It seems to me its very simple.
Amd have said they keep existing mb platforms. For 48 cores on epyc they need to upgrade within the ccx for that to work right?

Gideon · Aug 9, 2017

I still don't understand, why can't they just put 3 CXX'is on a chip. Mapping them to memory should be way easier than interconnecting 6 cores within a CXX

Ajay · Aug 9, 2017

itsmydamnation said:
People have to say how they expect amd to hot 48 core a socket. 6 core ccx makes most sense. L3 cache within a CCX already isn't uniform so moving to 6 core for the average case probably isn't increasing latency at all.

More then 4 dies makes using an organic interposer almost impossible. The other option is 3 ccx a die. I think 6 core ccx has the best overall fit for all markets ( console, laptop, HEDT, server)

I agree, assuming AMD can afford the extra effort (time & manpower). Inserting 2 more cores into the middle of the current floorplan would seem to be the best choice for reorganizing the L3$ interconnects withing a CCX. Basically leading to 3 horizontal 'slices' in a plan view like below.

HC28.AMD.Mike%20Clark.final-page-014.jpg

Some have posted some images suggesting that the 4 core CCX will continue on in Zen2 - if so, it's probably due to a resource deficit or TTM issue.

moinmoin · Aug 9, 2017

Ajay said:
Some have posted some images suggesting that the 4 core CCX will continue on in Zen2 - if so, it's probably due to a resource deficit or TTM issue.

As of Raven Ridge AMD will have just two different Zen dies, the known Zeppelin (2x CCX) die used from low end Ryzen up to Epyc, and the upcoming 1x CCX die in Ryzen Mobile and APUs. If a 6c CCX is indeed in the cards AMD may well want to further optimized the mobile APUs for power consumption on 7LP and keep it a 4c CCX there.

Tuna-Fish · Aug 9, 2017

NeoLuxembourg said:
We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

AMD has advertised to server customers that the Zen 2 Epycs that come with the GloFo 7nm will be compatible with the current motherboards and that they will have 48 cores per package.

We know that they are going up to 48 cores per CPU, we just don't know the topology. We also know that they intend to still have 8 memory channels, which implies 12 cores per chip.

SpaceBeer said:
You are right, my mistake. But these links would represent access to L3 chache, right? So it would still be better to stay at 4 cores?

It's not as clear cut. The cores are not attached to each other with links that go both ways, instead all 4 cores have their own link to each 4 L3 slices, all intercore communication goes through the L3. This means that there are currently 16 links. If they keep the system of having a L3 block for each core, going up to 6 cores would mean having 36 links.

However, nothing in the system requires that the amount of L3 blocks is the same as the amount of the cores, and indeed there are technical reasons why you'd want to have a power-of-two amount of L3 blocks (as they are accessed by low-order address bit interleave, p-o-t amount of blocks can be done by a very simple system, but n-p-o-t would require a divider for address calculation.)

If instead there are 6 cores but only 4 L3 blocks, now only 24 links are required.

nix_zero · Aug 9, 2017

NeoLuxembourg said:
We just had a massive jump in core count, I don't think that we should expect another one next year, so IMO there will be no configuration changes for "Zen 2".

Improving frequency/IPC over time is what AMD normally does ( Bulldozer -> Piledriver, Bobcat -> Jaguar ) and specially with Zen this makes a lot of sense.

if so, starship then how would happen?

William Gaatjes · Aug 9, 2017

I think the consumer and possible console configuration will remain 4 core ccx for some time to come until 8 cores is a bare minimum in the desktop pc world.
Going from jaguar to zen ccx makes sense for consumer and console cpu's. It is small , cheap and powerful enough for such tasks.
I predict that the professional high end server versions (and some HEDT) will be 8 core CCX in the upcoming years to come.

jpiniero · Aug 9, 2017

Doom2pro said:
If they use 4 core CCX like now, what are they going to do with all that extra space? I doubt they will sell the same as what we have now at smaller die area, lower power and cheaper... They have to improve somewhere other than die shrink, why not up core count & increase AVX with other improvements?

If AMD is serious about releasing this in the first half of 2019, yields are going to suck. It pretty much has to be a smaller die than Summit Ridge.

Edit: I'm actually warming up to 4x3. I think it would make it easier to make a smaller 4x2 and 4x1 model if they want. What would be even better if (as mentioned) they could have a Navi die connected by IF.

Speculation: The CCX in Zen 2

How many cores per CCX in 7nm Zen 2?

4 cores per CCX (3 or more CCXs per die)

6 cores per CCX (2 or more CCXs per die)

8 cores per CCX (1 or more CCXs per die)

Senior member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Lifer

Diamond Member

Golden Member

Junior Member

Lifer

Lifer