Speculation: The CCX in Zen 2

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

How many cores per CCX in 7nm Zen 2?

  • 4 cores per CCX (3 or more CCXs per die)

    Votes: 55 45.1%
  • 6 cores per CCX (2 or more CCXs per die)

    Votes: 44 36.1%
  • 8 cores per CCX (1 or more CCXs per die)

    Votes: 23 18.9%

  • Total voters
    122

maddie

Diamond Member
Jul 18, 2010
4,723
4,628
136
Increasing the number of basic units for EPYC would require additional IFOP connections on the chips and more reservation spots/switching targets in the internal IF uncore of the chips themselves. If you're going through all of that, it's likely going to be no more complex to add two more CCXs to the existing floorplan. While that will definitely up the transistor count, it shouldn't increase the effect beyond the existing 14/12nm true area.

Interestingly, though, if they wanted to, they could keep roughly the same basic layout of the individual chips at 7nm, but move the DRAM and IO controllers off the chip and onto a specific I/O chip, leaving the rest to be essentially CCX and IF chips on the same EMIB/MCM package. So, have 5 chips on one package, four with IF links to the 5th, and the 5th handling all the I/O between the package and the rest of the system. This way, they can change out DRAM controllers, PCI controllers, etc without having to redo the whole chip, or update the package for different applications in isolation of the cores. Having an EMIB/MCM package can allow them to run the IF links between the chips at similar speeds to what they do internally in the chips today. Consumer chips could be a mix of 2 to 4 7nm chiplets, and an I/O chiplet, and maybe contain an iGPU chiplet as well on dual CPU chiplet packages. At 7nm, but maintaining the existing AM4 socket, they'll have plenty of package size to play with for things like that. It would even be possible to integrate an HBM package in there as well. On a desktop product, cooling a package with two CPU chiplets, an iGPU chiplet, an HBM stack and an I/O chiplet would no be unreasonable. With low enough voltage and frequency targets, it could even work on mobile. Intel is already there with KL-G. Their pricing on the product is indicative of their uniqueness in the market and not entirely a product of cost of production.
Would AMD when they started designing Zen2 several years ago, be willing to take such a radical departure. Remember, every $ was scarce. I find it difficult to see the basic unit not being a complete stand alone CPU.

In their papers on interposer connected, composite CPUs, one of the points stressed was the ability to migrate early to an advanced node even if yields were comparatively poor and use innovative topologies to connect into a high core count CPU. That was the focus of the research. How to overcome the problems of early node fabrication. Other benefits were better binning options, etc.

AdoredTV did a recent video but this topic has been discussed here a long time ago using the same PDFs mentioned in his video.

They will have the greatest advantage now as they migrate to 7nm. Seeing as this move has been planned years ago, I'm pushed into the expectations I have.

Can an organic package accommodate the required connections for an 8 chiplet CPU? Nope. Seems as if a SI is needed for all the chiplets.
 
  • Like
Reactions: Vattila

maddie

Diamond Member
Jul 18, 2010
4,723
4,628
136
Yield is going to suck, yes. But that's where Ryzen and Threadripper come in. I mean you won't see 16 core Ryzen next year, and they could even do what Intel did and introduce a 12 core r9 and keep the core counts of r7 and r5 similar.
They have 2 consumer lines now, Ryzen and TR. With TR2 it seems AMD has decided to market it into lower core count models that can be used for gaming also and higher core count ones for pure work related problems. Namely, X and WX models.

Keep Ryzen on 8 core maximum and push for CPU speed on the GloFlo 7nm process. Between IPC improvements, increased clocks, lower cost 100m^2 die, they can squeeze i9 from both directions with R7 3xxx and TR 3xxx. This so reminds me of a chess game. The available moves prevents any fantasy scenario from occurring and barring an act of utter stupidity, we see the inevitable conclusion. This 10 nm Intel fiasco is so much worse that many realize.
 

french toast

Senior member
Feb 22, 2017
988
825
136
I think tuna has it spot on, 6 core CCX.
12 core die, then picasso successor has one 6 core CCX.
Assuming 50% more cores/cache. slightly wider cores with 2 X 256 bit SIMD units..what is the die size consensus?
They will want the die to be smaller than summit ridge no?
I don't think consoles will affect this.

They could have two 10 core SKUs...<60w 329$ part and a <80w 399$ part...save the full fat 12 core part for the 499$ price bracket.
 
May 11, 2008
19,306
1,131
126
For epyc, so much cores makes sense. But for the desktop, i think that the jump in cores is a bit too fast too often.
I agree that for now , a 4 core ccx is much better fit for zen 2.
Improvements in IPC, improvements in the communication between the memory controllers and the cache / cores and wider paths and simd units.
ZEN3 will be 8 core ccx.
 

fibonacc

Junior Member
Aug 8, 2018
3
5
16
A 6 core CCX is unlikely, 64 is not a multiple of 6 :) 64 core is happening with Rome, sadly can't tell more.
AMD received feedback from multiple sources for the first Epyc that the number of cores in a CCX should be increased to allow certain kind of server apps to run better. So my bet is on 8 cores.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
So my bet is on 8 cores.

My bet is on 4 x 4-core CCXs with a more sophisticated topology, interconnect and coherency protocol between the CCXs to bring down average latency between any two cores in the 16-core CCX cluster.

In my hypothetical design, I speculate that it will be implemented on a 28nm active interposer that houses all the uncore-logic with 4 tiny 7nm CCXs mounted on top. 4 of these interposers in the package gives you 64-core EPYC.

Sounds sweet to me.
 

french toast

Senior member
Feb 22, 2017
988
825
136
A 6 core CCX is unlikely, 64 is not a multiple of 6 :) 64 core is happening with Rome, sadly can't tell more.
AMD received feedback from multiple sources for the first Epyc that the number of cores in a CCX should be increased to allow certain kind of server apps to run better. So my bet is on 8 cores.
Are you some kind of insider (or faker ;) )...or is this second hand knowledge from a source of yours?
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
AMD received feedback from multiple sources for the first Epyc that the number of cores in a CCX should be increased to allow certain kind of server apps to run better. So my bet is on 8 cores.

It would make a lot of sense to go with 8 cores, that opens quite a few avenues in server computing. And lets not forget another weakness of current CCX, 8MB L3 "domain", hard to memory busy workloads. Rumours have it, that AMD is increasing L3 to 4MB per core.

8C CCX with 32MB of L3 -> that is a dream setup both for desktop and servers! We can now fit 4 instances of our app on 2x20C intel, i think at least 6-7 per 64C Epyc would be possible, some epic progress for sure :)
 
  • Like
Reactions: french toast

Vattila

Senior member
Oct 22, 2004
799
1,351
136
It would make a lot of sense to go with 8 cores

If they go with a chiplet design as rumoured, then an 8-core CCX chiplet will be larger, yield worse and be more costly than a 4-core CCX chiplet. And it would be less reusable in the consumer space.

My bet is on 4 x 4-core CCXs with a more sophisticated topology, interconnect and coherency protocol between the CCXs to bring down average latency between any two cores in the 16-core CCX cluster.
  • A 4-core CCX chiplet would be tiny on 7nm (25-50 mm²), and hence reduce cost and increase yield and volume on the new and expensive 7nm processes.
  • A relatively small 200 mm² active interposer on the perfected 28nm process would be dirt cheap.
  • A 200 mm² die (the interposer with 4 chiplets on top) would fit into the current packaging scheme with few changes: 4 interposers for EPYC and high-core-count Threadripper WX, 2 interposers for low-core-count Threadripper X, and 1 interposer for mainstream Ryzen.
What's not to like?

https://forums.anandtech.com/threads/speculation-the-ccx-in-zen-2.2513648/page-7#post-39528340
 
Last edited:
  • Like
Reactions: Schmide and Gideon

french toast

Senior member
Feb 22, 2017
988
825
136
I can't believe that we would see 32gb L3 on desktop with 1 8 core CCX, would be good though!.
No one seems to be accounting for the increased transistors required for wider cores.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
No one seems to be accounting for the increased transistors required for wider cores.

Not true, I am. :)

That's why I quote 25-50 mm² for the 7nm 4-core CCX chiplet — 25 is a little bit bigger than a straight shrink, and 50 is the upper limit for four of them to fit it on a ~200 mm² interposer, which is my estimate based on the current size of "Zeppelin" (213 mm²).
 
  • Like
Reactions: french toast

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
8C CCX is to be expected as 6C CCX does not make any sense, this would mean that they ll go from 32C to 48C MCM while Intel would get from 28C to 56C in the same time, it s unlikely that AMD is to abandon the core count advantage they currently hold.
 
  • Like
Reactions: french toast

french toast

Senior member
Feb 22, 2017
988
825
136
So the consensus on here is we are probably getting 16 cores, just differences in topology.

I am sticking with 6 core ccx.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
No one seems to be accounting for the increased transistors required for wider cores.

If we agree that Intel has "wide" cores, to add 4MB of L3 and 2 cores Intel used 25mm^2 on 14nm++ ( so probably actual increase is even less, as other cores also have grown due to relaxed process? ). And the resulting chip already has 6 cores and 12MB on 150mm^2 PLUS GPU.

So AMD is using 7nm that is touted as having big advance in density, building a chip without GPU and still has trouble with sizing it?


That's why I quote 25-50 mm² for the 7nm 4-core CCX — 25 is a little bit bigger than a straight shrink, and 50 is the upper limit to fit it on a ~200 mm² interposer, which is my estimate based on the current size of "Zeppelin" (213 mm²).

In my opinion this is the case of hammer and nail syndrome, convince Yourself that AMD is using chiplets and vegan tear sauce, and then You have to fit those "chiplets" on 200mm^2 sized interposers and invent have to invent 4x4.

All that when rumours are talking about 64C with 256M l3.
 
  • Like
Reactions: ryan20fun

french toast

Senior member
Feb 22, 2017
988
825
136
If we agree that Intel has "wide" cores, to add 4MB of L3 and 2 cores Intel used 25mm^2 on 14nm++ ( so probably actual increase is even less, as other cores also have grown due to relaxed process? ). And the resulting chip already has 6 cores and 12MB on 150mm^2 PLUS GPU.

So AMD is using 7nm that is touted as having big advance in density, building a chip without GPU and still has trouble with sizing it?




In my opinion this is the case of hammer and nail syndrome, convince Yourself that AMD is using chiplets and vegan tear sauce, and then You have to fit those "chiplets" on 200mm^2 sized interposers and invent have to invent 4x4.

All that when rumours are talking about 64C with 256M l3.
I'm talking about specifically 16 core die..whether that be 2x8 or 4x4 (unlikely imo)..would be surprised if transistors increased by 50% per core (incl cache)..double the cores and that seems too big for early 7nm imo.
I think they would want a smaller die than 213mm2 for 7nm.
I'm going for 12 core ryzen 3xx, 36/48 core TR3, 64 core Epyc 2 Rome...with Rome using a different die with 8 core CCX, larger caches, SMT3/4.
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,620
136
I can't see how increasing the amount of cores in a CCX will be any progress for AMD. We already had the discussion how a 4 core CCX is the most Zen like and how adding any more cores increases routing complexity tenfold. At this point this connection complexity is something for IF to handle (e.g. through more CCXs on one die, potentially adding a L4$ etc.), not for a new CCX design.

Instead for the Zen 2 CCX design I'd expect AMD to make it wider, ideally without actually making every actual core wider. Taking reverse notes from the Bulldozer school of designs (I know I know) the new CCX design could partly combine the frontend of 2 cores each to effectively allow for a SP and DP mode. As a result the latter mode could implement SMT4 and AVX512 (combining Octa-issue 128-bit FPU) without losing the current efficiency of the former, with the advantage that the power and resource use of wider features is more predictable than Intel's current approach.

(One could spin this further and make such a new 4 core DP CCX the new default, effectively a 8 core SP CCX and simplify the resulting layout to not increase routing complexity over the current 4 core SP CCX. But this likely would decrease the efficiency of the SP cores.)
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
In my opinion this is the case of hammer and nail syndrome, convince Yourself that AMD is using chiplets and vegan tear sauce, and then You have to fit those "chiplets" on 200mm^2 sized interposers and invent have to invent 4x4.

All that when rumours are talking about 64C with 256M l3.

I also find it very hard to believe AMD is jumping to Active interposer already with their first iteration of Zen2. The complexity and risk is IMO way too much to be worth it. What if EPYC2 would be totally ready by Q1, but because of some issues and complexity with their very first Active Interposer would need multiple respins and be postponed 6+ months?

That said, repeating the 8-core CCX mantra over and over also seems like the hammer and nail syndrome. The extra connections needed between each and every core within CCX means that they have to opt for some exotic topology within a CCX. It would make much more sense to improve communication between CCX's.
If we agree that Intel has "wide" cores, to add 4MB of L3 and 2 cores Intel used 25mm^2 on 14nm++ ( so probably actual increase is even less, as other cores also have grown due to relaxed process? ). And the resulting chip already has 6 cores and 12MB on 150mm^2 PLUS GPU.

So AMD is using 7nm that is touted as having big advance in density, building a chip without GPU and still has trouble with sizing it?

And you are vastly underestimating the benefits of having a smaller die. From this paper:
meFXWwQ.png


Not only would yield improve, but average clock-speeds would also noticeably improve on the same node (the smaller the chiplet the higher the clocks). It's that you would hit diminishing returns somewhere around 8-4 cores and face communication overhead, but still that doesn't rule out 2x 4CCX vs 1x8CCX. Personally i just find 8 core CCX really hard to believe.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
In my opinion this is the case of hammer and nail syndrome, convince Yourself that AMD is using chiplets and vegan tear sauce, and then You have to fit those "chiplets" on 200mm^2 sized interposers and invent have to invent 4x4.

Ah. So you are dismissing the latest buzz around the rumour of a chiplet and interposer design for 64-core EPYC. Fair enough. I would like to as well, as it confused me until I came up with this latest 4 x 4-core CCX chiplet hypothesis. I used to have a much simpler vision about Zen 2 and Zen 3 beforehand ("Zeppelin" replacement with 3 CCXs for 48-core EPYC 2, 4 CCXs for 64-core EPYC 3).

What interconnect topology do you think your preferred 8-core CCX would use? Would it build on a 4-core optimal direct-connect (in which case it would be some kind of super-CCX)? Or a flattened topology, such as ringbus or mesh, with a more uniform latency, albeit worse than direct-connect? Or a more sophisticated topology ("butter doughnut", etc.)?

My hunch is that AMD will build on direct-connect, and unfortunately that does not scale beyond 4 cores. So the way I see it, the topology will be a complex hybrid building on optimally connected quad-cores (CCXs). I discuss this in the OP of this thread.

An interesting note is that, if AMD goes with interposer and chiplets, they will have a lot of metal layers to play with for the interconnect — layers in the interposer die, plus layers in the chiplet mounted on top.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
And you are vastly underestimating the benefits of having a smaller die. From this paper:

The benefits are known since the dawn of Silicon processing. But why not extend Your train of thought and build even smaller dies, 2 cores? 1 core? 1 "atom like " core? Where do we stop.

8C CCX has all the benefits, while sticking to decent manufacturability ( compared to 28C monsters ).
 

Trumpstyle

Member
Jul 18, 2015
76
27
91
If they go with a chiplet design as rumoured, then an 8-core CCX chiplet will be larger, yield worse and be more costly than a 4-core CCX chiplet. And it would be less reusable in the consumer space.

My bet is on 4 x 4-core CCXs with a more sophisticated topology, interconnect and coherency protocol between the CCXs to bring down average latency between any two cores in the 16-core CCX cluster.
  • A 4-core CCX chiplet would be tiny on 7nm (25-50 mm²), and hence reduce cost and increase yield and volume on the new and expensive 7nm processes.
  • A relatively small 200 mm² active interposer on the perfected 28nm process would be dirt cheap.
  • A 200 mm² die (the interposer with 4 chiplets on top) would fit into the current packaging scheme with few changes: 4 interposers for EPYC and high-core-count Threadripper WX, 2 interposers for low-core-count Threadripper X, and 1 interposer for mainstream Ryzen.
What's not to like?

https://forums.anandtech.com/threads/speculation-the-ccx-in-zen-2.2513648/page-7#post-39528340

Stuff just doesn't scale perfectly, a 4-core ccx would not be 50mm2 on 7nm but more likely 125mm2, while a 6-core ccx would be 150mm2. This is because the cpu cores scales good but there is random stuff in the chip that don't scale well at all. So I put the odds we seeing some kind of 4-core ccx at 0%.

But let's see.
 

Glo.

Diamond Member
Apr 25, 2015
5,662
4,421
136
Predictions

Stays with 2 4Core CCX = 8 core basic unit as exists today.
No seperate uncore but more L3 cache
Fabric speed increases to accomodate AM4 memory limitations
Improved layout and de-bottlenecking = greater IPC + increased Clocks
More than 4 basic units for higher count EPYC CPUs = 8 x 8C die [64 cores]
EPYC on passive interposer of ~ 900mm^2 [minimal cost increase]
EPYC die fabbed at TSMC process for absolute efficiency.
Ryzen 3xxx fabbed at GloFlo process for higher clock speeds.
Almost all correct, aparat from L3 cache. 64 core EPYC will have 256 MB L3 cache which divided 8 times gives 16 MB’s.

:)