Speculation: The CCX in Zen 2

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

How many cores per CCX in 7nm Zen 2?

  • 4 cores per CCX (3 or more CCXs per die)

    Votes: 55 45.1%
  • 6 cores per CCX (2 or more CCXs per die)

    Votes: 44 36.1%
  • 8 cores per CCX (1 or more CCXs per die)

    Votes: 23 18.9%

  • Total voters
    122

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
I'm sticking with 6 core ccx for desktop and 8 core ccx for servers. We got very strong rumors pointing towards this.
That would mean doubling the engineering effort (IMO unnecessarily). Considering how much AMD has recycled Zeppelin (essentially not even changing anything for 12 nm Ryzen and Threadripper) I dont' see them suddenly doing 2 totally unrelated designs for both server and desktop
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
a 4-core ccx would not be 50mm2 on 7nm but more likely 125mm2

Typo?

The size of a CCX is 45.5 mm² on 14LPP, and with over 2x density on the 7LP process, a straight shrink should be down to less than half the size. 25-50 mm² is allowing for some additional transistor budget for core improvements and larger caches.
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
Perfectly straight forward:
Dual 4 core CCX in Matisse design.
8 core CPU.
16 MB's L3 cache.
Around 120 mm2 die size.

48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

AMD decided to go 8 core design route for perfectly simple scaling of the CPU.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

What is the topology between dies? Again, since we go beyond 4, direct-connect is out of the question. Hence why I doubt this.

And what about memory controllers?
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
What is the topology between dies? Again, since we go beyond 4, direct-connect is out of the question. Hence why I doubt this.
Ask the IO die on the package, with Interposer connecting all of the dies ;).
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
That's too much of a change with Zen 2.
Why it had to be the same? ;)

Maddie already posted some of the reasons: TSMC makes server CPUs because of efficiency of their process, and because TSMC has experience with large Interposers(GV100), GF process clocks higher than TSMC's hence why on this process you will get Matisse, AM4 CPUs.
 

french toast

Senior member
Feb 22, 2017
988
825
136
Perfectly straight forward:
Dual 4 core CCX in Matisse design.
8 core CPU.
16 MB's L3 cache.
Around 120 mm2 die size.

48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

AMD decided to go 8 core design route for perfectly simple scaling of the CPU.
I don't agree with the core numbers, but this seems reasonable imo, I agree we will be looking at a smaller ~150mm2 die for desktop, but does not play into the core wars strategy...it is possible we get 12 cores, but in 3x4 CCX...that offers up greater flexibility with server.
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
I don't agree with the core numbers, but this seems reasonable imo, I agree we will be looking at a smaller ~150mm2 die for desktop, but does not play into the core wars strategy...it is possible we get 12 cores, but in 3x4 CCX...that offers up greater flexibility with server.
You do understand that dual Matisse package will be perfectly capable to fit in AM4 package? ;)

If AMD will need more cores to offer better product than Intel - they will make dual CPU package for AM4. If they will not - we will get simply 8 core design.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
while a 6-core ccx would be 150mm2

That does not pass any common sense checks. Intel has Coffee Lake 150mm, with GPU on board. AMD 4 cores and 8MB of L3 are estimated 44mm^2. How can 6 cores be that large on a process more dense?
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
Why it had to be the same? ;)

R&D costs, ensuring they meet project schedules, you know, that sort of thing. They already made a rather large change in essentially switching from GloFo to TSMC when they were well into development.
 
  • Like
Reactions: Vattila

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
R&D costs, ensuring they meet project schedules, you know, that sort of thing. They already made a rather large change in essentially switching from GloFo to TSMC when they were well into development.
Do you think all of the design of Rome is out of budget for AMD, especially when that R&D cost will give AMD opportunity for much higher asking price and margin, from EPYC2 CPUs?

Imagine that manufacturing cost goes up from 100$ currently on EPYC CPU to 200$, but allows them to charge not 4999$, but 9999$ for highest SKU.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
Do you think all of the design of Rome is out of budget for AMD, especially when that R&D cost will give AMD opportunity for much higher asking price and margin, from EPYC2 CPUs?

If anything, that's exactly why they also changed from 12 core dies to 16 when they switched to TSMC. My guess is that the 16 core die products will be offlabel so you won't see them unless you are Amazon or Google.
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
If anything, that's exactly why they also changed from 12 core dies to 16 when they switched to TSMC. My guess is that the 16 core die products will be offlabel so you won't see them unless you are Amazon or Google.
What 12 and 16 core dies?

There are only 8 core CPUs, and 8 core dies.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
What 12 and 16 core dies?

There are only 8 core CPUs, and 8 core dies.

That was what I was getting at, what you are suggesting is way too big of a change for them at this point. IO die, etc, that's something more for Milan or maybe even later when they switch sockets.
 

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
Why not 4 x 4-core CCX chiplets on an active interposer? See my earlier posts.

This would require yet another interconnect between CCXs, if they did the things well IF is scalable as well easily, doubling its paths widths and the caches sizes is the most logical perf/watt wise.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
A lot of you guys aren't thinking this through. Zen2 has to be compatible with AM4. AM4 has dual channel RAM. Adding more cores will add bandwidth and latency constraints as cores become starved of RAM. Zen2 will be 2x4 core CCXes, just like previous designs. You won't see a core increase until a new socket.
 
  • Like
Reactions: Glo.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Dual 4 core CCX in Matisse design.

It seems pretty obvious that the two CCXs in Zeppelin scales perfectly to 4 CCXs using direct-connect. Why not take advantage of that?

This forms a hierarchical two-layered topology of direct-connected quads (what I, probably somewhat incorrectly, calls a quad-tree topology in my OP). Then optimise this topology by adding further connections as far as metal layers allow, creating a more complex and optimised topology, that brings down average latency between any two cores.

Then connect up to 4 of these 4-CCX dies together using direct-connect on the package, as they currently do. This avoids yet another sub-optimal interconnect scheme between the 6 to 8 dies in your approach, which also require the packaging to change to use a large interposer underpinning all the dies.

The simplest options I see:
  1. If we assume AMD does not move to a chiplet design, then just add two direct-connected 4-core CCXs to the die.
  2. Assuming AMD moves to a chiplet design, then implement the uncore on an active interposer with 4 x 4-core CCX chiplets mounted on top.
Both approaches can reuse the current MCM packaging scheme.
 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136
You won't see a core increase until a new socket.

So they (including me) thought about Threadripper SocketTR4. Yet here we are with 32-core Threadripper WX.

I think 16 cores on AM4 is now a given for Ryzen 3000.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
A lot of you guys aren't thinking this through. Zen2 has to be compatible with AM4. AM4 has dual channel RAM. Adding more cores will add bandwidth and latency constraints as cores become starved of RAM. Zen2 will be 2x4 core CCXes, just like previous designs. You won't see a core increase until a new socket.
I mostly agree, apart from the last bit ;).

Nothing stops AMD from offering 16 core SKU, on AM4 board with Zen 2, but made from two Matisse CPUs.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
It seems pretty obvious the two CCXs in Zeppelin scales perfectly to 4 CCXs using direct-connect. Why not take advantage of that?

How is this "perfect" scaling defined? By having 80ns of latency? Please don't drink too much AMD cool-aid. The fact that one needs to mention 4 cores and interconnect in same sentence is not "perfect", the opposite is true.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
4 CCXs will have no worse latency than 2 CCXs, since they all will be directly connected (6 links between 4 CCXs). See my OP.

That is simply not true. Instead of checking just 1 CCX, requests will need to be sent to 3 entities, and same proliferation of targets will happen in socket ( or god forbid in dual socket). Coherency is a nice feature, but does not come for free.

2 socket systems are much much easier than 4S, scale better. Even if basic QPI interconnect has same speeds and latencies.