Speculation: The CCX in Zen 2

Gideon · Aug 10, 2018

Trumpstyle said:
I'm sticking with 6 core ccx for desktop and 8 core ccx for servers. We got very strong rumors pointing towards this.

That would mean doubling the engineering effort (IMO unnecessarily). Considering how much AMD has recycled Zeppelin (essentially not even changing anything for 12 nm Ryzen and Threadripper) I dont' see them suddenly doing 2 totally unrelated designs for both server and desktop

Vattila · Aug 10, 2018

Trumpstyle said:
a 4-core ccx would not be 50mm2 on 7nm but more likely 125mm2

Typo?

The size of a CCX is 45.5 mm² on 14LPP, and with over 2x density on the 7LP process, a straight shrink should be down to less than half the size. 25-50 mm² is allowing for some additional transistor budget for core improvements and larger caches.

Glo. · Aug 10, 2018

Perfectly straight forward:
Dual 4 core CCX in Matisse design.
8 core CPU.
16 MB's L3 cache.
Around 120 mm2 die size.

48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

AMD decided to go 8 core design route for perfectly simple scaling of the CPU.

Vattila · Aug 10, 2018

Glo. said:
48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

What is the topology between dies? Again, since we go beyond 4, direct-connect is out of the question. Hence why I doubt this.

And what about memory controllers?

jpiniero · Aug 10, 2018

I still think it's 4x4, and that the GloFo version was going to be 4x3.

Glo. · Aug 10, 2018

Vattila said:
What is the topology between dies? Again, since we go beyond 4, direct-connect is out of the question. Hence why I doubt this.

Ask the IO die on the package, with Interposer connecting all of the dies

.

jpiniero · Aug 10, 2018

Glo. said:
Ask the IO die on the package, with Interposer connecting all of the dies .

That's too much of a change with Zen 2.

Glo. · Aug 10, 2018

jpiniero said:
That's too much of a change with Zen 2.

Why it had to be the same?

Maddie already posted some of the reasons: TSMC makes server CPUs because of efficiency of their process, and because TSMC has experience with large Interposers(GV100), GF process clocks higher than TSMC's hence why on this process you will get Matisse, AM4 CPUs.

french toast · Aug 10, 2018

Glo. said:
Perfectly straight forward:
Dual 4 core CCX in Matisse design.
8 core CPU.
16 MB's L3 cache.
Around 120 mm2 die size.

48 core EPYC2 is made from 6 dies.
64 Core is made from 8 dies.

AMD decided to go 8 core design route for perfectly simple scaling of the CPU.

I don't agree with the core numbers, but this seems reasonable imo, I agree we will be looking at a smaller ~150mm2 die for desktop, but does not play into the core wars strategy...it is possible we get 12 cores, but in 3x4 CCX...that offers up greater flexibility with server.

Glo. · Aug 10, 2018

french toast said:
I don't agree with the core numbers, but this seems reasonable imo, I agree we will be looking at a smaller ~150mm2 die for desktop, but does not play into the core wars strategy...it is possible we get 12 cores, but in 3x4 CCX...that offers up greater flexibility with server.

You do understand that dual Matisse package will be perfectly capable to fit in AM4 package?

If AMD will need more cores to offer better product than Intel - they will make dual CPU package for AM4. If they will not - we will get simply 8 core design.

JoeRambo · Aug 10, 2018

Trumpstyle said:
while a 6-core ccx would be 150mm2

That does not pass any common sense checks. Intel has Coffee Lake 150mm, with GPU on board. AMD 4 cores and 8MB of L3 are estimated 44mm^2. How can 6 cores be that large on a process more dense?

jpiniero · Aug 10, 2018

Glo. said:
Why it had to be the same?

R&D costs, ensuring they meet project schedules, you know, that sort of thing. They already made a rather large change in essentially switching from GloFo to TSMC when they were well into development.

Glo. · Aug 10, 2018

jpiniero said:
R&D costs, ensuring they meet project schedules, you know, that sort of thing. They already made a rather large change in essentially switching from GloFo to TSMC when they were well into development.

Do you think all of the design of Rome is out of budget for AMD, especially when that R&D cost will give AMD opportunity for much higher asking price and margin, from EPYC2 CPUs?

Imagine that manufacturing cost goes up from 100$ currently on EPYC CPU to 200$, but allows them to charge not 4999$, but 9999$ for highest SKU.

jpiniero · Aug 10, 2018

Glo. said:
Do you think all of the design of Rome is out of budget for AMD, especially when that R&D cost will give AMD opportunity for much higher asking price and margin, from EPYC2 CPUs?

If anything, that's exactly why they also changed from 12 core dies to 16 when they switched to TSMC. My guess is that the 16 core die products will be offlabel so you won't see them unless you are Amazon or Google.

Glo. · Aug 10, 2018

jpiniero said:
If anything, that's exactly why they also changed from 12 core dies to 16 when they switched to TSMC. My guess is that the 16 core die products will be offlabel so you won't see them unless you are Amazon or Google.

What 12 and 16 core dies?

There are only 8 core CPUs, and 8 core dies.

jpiniero · Aug 10, 2018

Glo. said:
What 12 and 16 core dies?

There are only 8 core CPUs, and 8 core dies.

That was what I was getting at, what you are suggesting is way too big of a change for them at this point. IO die, etc, that's something more for Milan or maybe even later when they switch sockets.

Abwx · Aug 10, 2018

Vattila said:
Why not 4 x 4-core CCX chiplets on an active interposer? See my earlier posts.

This would require yet another interconnect between CCXs, if they did the things well IF is scalable as well easily, doubling its paths widths and the caches sizes is the most logical perf/watt wise.

eek2121 · Aug 10, 2018

A lot of you guys aren't thinking this through. Zen2 has to be compatible with AM4. AM4 has dual channel RAM. Adding more cores will add bandwidth and latency constraints as cores become starved of RAM. Zen2 will be 2x4 core CCXes, just like previous designs. You won't see a core increase until a new socket.

Vattila · Aug 10, 2018

Glo. said:
Dual 4 core CCX in Matisse design.

It seems pretty obvious that the two CCXs in Zeppelin scales perfectly to 4 CCXs using direct-connect. Why not take advantage of that?

This forms a hierarchical two-layered topology of direct-connected quads (what I, probably somewhat incorrectly, calls a quad-tree topology in my OP). Then optimise this topology by adding further connections as far as metal layers allow, creating a more complex and optimised topology, that brings down average latency between any two cores.

Then connect up to 4 of these 4-CCX dies together using direct-connect on the package, as they currently do. This avoids yet another sub-optimal interconnect scheme between the 6 to 8 dies in your approach, which also require the packaging to change to use a large interposer underpinning all the dies.

The simplest options I see:

If we assume AMD does not move to a chiplet design, then just add two direct-connected 4-core CCXs to the die.
Assuming AMD moves to a chiplet design, then implement the uncore on an active interposer with 4 x 4-core CCX chiplets mounted on top.

Both approaches can reuse the current MCM packaging scheme.

Vattila · Aug 10, 2018

eek2121 said:
You won't see a core increase until a new socket.

So they (including me) thought about Threadripper SocketTR4. Yet here we are with 32-core Threadripper WX.

I think 16 cores on AM4 is now a given for Ryzen 3000.

Vattila · Aug 10, 2018

Yohoo! The poll now has 4-core CCX in the lead!

Glo. · Aug 10, 2018

eek2121 said:
A lot of you guys aren't thinking this through. Zen2 has to be compatible with AM4. AM4 has dual channel RAM. Adding more cores will add bandwidth and latency constraints as cores become starved of RAM. Zen2 will be 2x4 core CCXes, just like previous designs. You won't see a core increase until a new socket.

I mostly agree, apart from the last bit

.

Nothing stops AMD from offering 16 core SKU, on AM4 board with Zen 2, but made from two Matisse CPUs.

JoeRambo · Aug 10, 2018

Vattila said:
It seems pretty obvious the two CCXs in Zeppelin scales perfectly to 4 CCXs using direct-connect. Why not take advantage of that?

How is this "perfect" scaling defined? By having 80ns of latency? Please don't drink too much AMD cool-aid. The fact that one needs to mention 4 cores and interconnect in same sentence is not "perfect", the opposite is true.

Vattila · Aug 10, 2018

JoeRambo said:
How is this "perfect" scaling defined?

4 CCXs will have no worse latency than 2 CCXs, since they all will be directly connected (6 links between 4 CCXs). See my OP.

JoeRambo · Aug 10, 2018

Vattila said:
4 CCXs will have no worse latency than 2 CCXs, since they all will be directly connected (6 links between 4 CCXs). See my OP.

That is simply not true. Instead of checking just 1 CCX, requests will need to be sent to 3 entities, and same proliferation of targets will happen in socket ( or god forbid in dual socket). Coherency is a nice feature, but does not come for free.

2 socket systems are much much easier than 4S, scale better. Even if basic QPI interconnect has same speeds and latencies.

Speculation: The CCX in Zen 2

How many cores per CCX in 7nm Zen 2?

4 cores per CCX (3 or more CCXs per die)

6 cores per CCX (2 or more CCXs per die)

8 cores per CCX (1 or more CCXs per die)

Platinum Member

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member