• Hey there Guest! This holiday season you have a chance to win some cool AnandTech swag! Check out “The Holiday Hotlist” giveaway, which you can find here.

64 core EPYC Rome (Zen2)Architecture Overview?

Shivansps

Platinum Member
Sep 11, 2013
2,257
36
126
#2
omg, if this rumor end up to be true it also means 8C/16T CCXs.
 
Last edited:

LightningZ71

Senior member
Mar 10, 2017
226
0
71
#3
Big, if true.

Interestingly, the system controller chip can be fabbed at gloflo at 12nm to help with the volume agreement.
 

dnavas

Senior member
Feb 25, 2017
226
2
61
#4
Latency to RAM would be ... interesting. Cache coherent network gets really prominent billing. 8 channels of DDR4 is specified, but number of PCIE4 (!) channels is not. Apparently the Legacy Peripherals aren't attached, and the secure processor is secured by being entirely unreachable :) Memory compression? Isn't that normally a video card thing?
 

Vattila

Senior member
Oct 22, 2004
362
50
136
#5
it also means 8C/16T CCXs.
If so, it will be interesting to see how the cores are wired up, i.e. what the interconnect topology looks like. Same goes for the topology for the chiplets. It is certainly no longer direct-connection for 8 cores, nor for 8 chiplets.
 

Shivansps

Platinum Member
Sep 11, 2013
2,257
36
126
#6
If so, it will be interesting to see how the cores are wired up, i.e. what the interconnect topology looks like. Same goes for the topology for the chiplets. It is certainly no longer direct-connection for 8 cores, nor for 8 chiplets.
More than that, where is the edram? this leads me to belive there will be 256mb of edram on a CCX.
 

coercitiv

Platinum Member
Jan 24, 2014
2,906
106
136
#7
I came to this thread looking for Zen 2 EPYC details and all I got was this old empire interstellar dreadnought class schematic.
 

Gideon

Senior member
Nov 27, 2007
413
6
136
#8
I'm not really convinced yet this is true. However 8 chiplets IMO seem to imply a routed interposer in use (good old butterdonuts? :D)
 

NTMBK

Diamond Member
Nov 14, 2011
8,050
28
126
#9
Urgh, I really hope this isn't true. Making every access to memory require an infinity fabric hop isn't going to help things.
 
Feb 2, 2009
12,740
12
126
#12
Nice , leave the thread open for future use.
 

Tuna-Fish

Senior member
Mar 4, 2011
902
27
116
#13
The diagram is not credible at all, none of this makes any sense.
 

Topweasel

Diamond Member
Oct 19, 2000
4,500
94
126
#14

moinmoin

Senior member
Jun 1, 2017
566
20
96
#15
Memory compression? Isn't that normally a video card thing?
Aside that everything is BS anyway, I think some form of transparent real time memory/transport compression is overdue to alleviate any bandwidth congestion (most advantageous both in bandwidth limited high core MCMs like TR2 and in APUs).
 

Vattila

Senior member
Oct 22, 2004
362
50
136
#16
The diagram is not credible at all, none of this makes any sense.
Can you please elaborate?

You sound confidently knowledgeable, but your assertion makes me none the wiser.

In my view, as a non-expert, much of the diagram makes sense, e.g. 8 memory channels are expected, level 4 cache seems reasonable, AES for memory encryption is expected, memory compression seems reasonable, security processor is a given, as is an integrated south-bridge, and the 9-chiplet design accords with rumours (AdoredTV, Twitter chatter).

The two things my suspicion reacted to were the 8-core chiplet with shared L3 (implying an 8-core CCX, which I very much doubt, and which conflicts with SemiAccurate's reporting), as well as the red arrow pointing to the "Low Latency IF" with the annotation "Parallel interface?". The latter in particular seemed suspect.
 
Last edited:

Vattila

Senior member
Oct 22, 2004
362
50
136
#17
And guys remember, Matisse IS 8C/16T design. That is one thing sure, at this moment.
Do you have a source for that?

By the way, I think Charlie from SemiAccurate has tweeted that the 8-core chiplets have two 4-core CCXs, just as before. Is that your source?
 

french toast

Senior member
Feb 22, 2017
916
0
91
#18
He said he made it up, I'm guessing that he is trying to make sense of all the chiplets rumours/having a play.
He said not to take it seriously, so I won't.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,188
9
106
#19
L3 cache is on-die SRAM. => 2 TB/s per CCX?
L4 cache is 3D stacked SRAM. => 1 TB/s per two CCXs?
L5 cache is 2.5D stacked HBM3 => 512 GB/s per CPU?

If you want to inflate the ASP of such things.

CPU + HBM => 200 mm squared.
Stacked SRAM => No die area penalty.
Maintains compatibility with existing solutions, but might swap to another substrate/interposer for higher density I/O capability.
GMI/xGMI assist goes to the HBM dies rather than the SRAMs.
 
Last edited:
Apr 27, 2000
10,187
112
126
#21
For a minute there, it was like AMD was trying to make their own OpenPOWER CPU. Who knows, maybe they are . . .

chiplets!

I was thinking, the other day, that it would be wise for AMD to move away from 4-core CCXs to 8-core CCXs, assuming they could keep intracore latency the same. That way the consumer parts (Matisse) could be 1 CCX 8c/16t, avoid the inter-CCX latency penalty, and gain a big boost in a lot of apps currently not well-optimized for Zen/Zen+.
 

Gideon

Senior member
Nov 27, 2007
413
6
136
#22
I was thinking, the other day, that it would be wise for AMD to move away from 4-core CCXs to 8-core CCXs, assuming they could keep intracore latency the same. That way the consumer parts (Matisse) could be 1 CCX 8c/16t, avoid the inter-CCX latency penalty, and gain a big boost in a lot of apps currently not well-optimized for Zen/Zen+.
With up to 8 cores it would probably make sense to use a ring-bus within the chiplet, as intel does. This would also allow unified L3 cache usage.

Either way they will take some hits. Either a uniform one with every core-to-core connection (mesh), a small one with cores more than 1 hop away (Ringbus) or a bigger one between different core-complexes (CCX). There are no free lunches.

They could also just improve the latency between CCXes, keeping the high-level architecture as-is, because it's ridiculously horrible right now. On-chip CCX latency is 2/3 of the latency between two separate dies (connected via slow MCM connections) Its also more than the memory latency.
 
Apr 27, 2000
10,187
112
126
#23
They could also just improve the latency between CCXes, keeping the high-level architecture the same as it's ridiculously horrible right now (2/3 of the latency between two dies and more than the memory latency)
They sort-of have to. Look at the latency numbers on the 2990WX and similar. Scary stuff.

The bad part about it is that as core counts go up, the share of power consumption granted to IF-related hardware goes wayyyyyyyyyy up, without even taking into account what happens when you start upping those IF clocks.

I am really, really hoping we get fully-tweakable IF speeds in Matisee and/or robust support for DDR4-4000 (or faster!) modules so we can get those IF clocks up high. The power usage penalty for high IF clocks shouldn't be too high on 8c Matisse.
 

Glo.

Platinum Member
Apr 25, 2015
2,617
4
106
#24
For a minute there, it was like AMD was trying to make their own OpenPOWER CPU. Who knows, maybe they are . . .

chiplets!

I was thinking, the other day, that it would be wise for AMD to move away from 4-core CCXs to 8-core CCXs, assuming they could keep intracore latency the same. That way the consumer parts (Matisse) could be 1 CCX 8c/16t, avoid the inter-CCX latency penalty, and gain a big boost in a lot of apps currently not well-optimized for Zen/Zen+.
If AMD will shy away from 4 core CCX it will mean that there will be no more NUMA and no more CCX design.

It will be pretty interesting design, after all, to be honest.
 

Glo.

Platinum Member
Apr 25, 2015
2,617
4
106
#25

First 5 minutes of this video are most important about Rome.
 

ASK THE COMMUNITY