64 core EPYC Rome (Zen2)Architecture Overview?

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Trumpstyle

Member
Jul 18, 2015
76
27
91
Firstly you have not understood how AMD CPUs are designed. AMD's CPU physical design is targetted at achieving high clock frequencies. So TSMC N7 is not a choice at all. AMD Zen CPUs had a max turbo boost of 4 Ghz. Zen+ CPUs performance had a max single core turbo of 4.35 Ghz. The mobile Ryzen 2700u had a max turbo of 3.8 Ghz. AMD is likely to target max clock frequencies atleast on par or higher than Zen+ for their 7nm Zen 2. So the only option is N7 HPC. AMD has already confirmed that all their 7nm CPUs and GPUs are using N7 HPC. This was confimed by Ashraf Eassa of The Motley Fool on twitter. But Ashraf has deactivated his twitter account a couple of months back.

btw I was one of the earliest people to propose that the Rome IO die could have L4 cache. But I think the chances of that are slim to none. Firstly for a significant amount of L4 cache (say 256 MB) AMD needs to go with 14HP and eDRAM for L4 cache. I think that process is not suitable for low cost high volume designs. Rome IO die needs to be low cost and low complexity. So its most likely based on the mature GF 14LPP node. Moreover if you look at the Zeppelin die and move all the IO and memory controller circuitry to a single die you would end up quite close to the 420 sq mm die size. AMD has probably spent some die area to maintain some cache information about the data stored on the L3 of each chiplet so that a chiplet can quickly look up that info to see if some data is in the L3 of another chiplet. But thats about it.

AMD's 8 core chiplet die is the basic building block for all of its 7nm products from server CPUs, desktop CPUs, desktop/notebook APUs , next gen console APUs (PS5/XB2). BTW AMD's move to chiplets is not only for servers. I expect almost every AMD design at 7nm to incorporate chiplets. AMD's move is very logical as its easier to yield smaller dies and you can match chiplets with similar characteristics to build SKUs across the product stack. The modularity and reusability of chiplets dictates that 8 cores is the right choice. Here is how I see the 7nm designs from AMD

Rome - 8 x 8=64C, 8MC, 128 PCI-E 4.0 lanes
Threadripper - 8 x 8=64C, 8MC, 128 PCI-E 4.0 lanes
Ryzen - 2 x 8=16C, 2MC, 32 PCI-E 4.0 lanes
Ryzen APU - 1 x 8 = 8C + Navi GPU chiplet 20 CU + 4 GB HBM2 cache, 2MC, 32 PCI-E 4.0 lanes
PS5/XB2 - 1 x 8= 8C + Navi GPU chiplet 80 CU, 256 or 384 bit GDDR6.

Here is how I see AMD's Navi product stack

Ryzen 7nm APU - 20CU, 1280 sp.
Navi 12 - 40CU , 2560 sp, 128 bit GDDR6 or 256 bit GDDR5X.
Navi 10 (PS5 GPU) - 80CU, 5120 sp, 256 bit GDDR6.
Navi 20 - 120CU, 7680 sp, 384 bit GDDR6.

I think Navi will be a good architecture and address long standing problems and drawbacks with GCN like scalability, perf and area efficiency, perf per CU, perf per sp. In fact I am optimistic because Sony is very aggressive with their PS5 graphics performance goals and Navi is heavily influenced by PS5's perf targets and design goals.

This post makes a lot of sense, I just wanna add 2 things, we don't know if Navi 20 is real, from what I could find it's originated from the site Fudzilla. And we know that Navi 11 exist from a leaked roadmap, I assume Navi 11 is Mobile/desktop APU as Vega11 was.

About Navi 12 I'm guessing you got that information from wccftech, we don't know how accurate that is :)
 

Hitman928

Diamond Member
Apr 15, 2012
5,244
7,793
136
VEGA11 appeared on leaked ROCm roadmaps, together with everything else that made it out alive.

I couldn't find any such leaks outside of some super blurry rumor articles and it wasn't ever a part of AMD's ROCm presentations.
 

Trumpstyle

Member
Jul 18, 2015
76
27
91
Marketing names have no relation to their die naming schemes.
VEGA10 is Vega56/64/WX8200/9100.
VEGA11 is ded.
RAVEN is RR iGPU bins.
VEGA12 is Vega16/20.
VEGA20 is Mi50/Mi60.

I think you're overthinking the code names.
Vega 10 = 64CU
Vega 20 = 64CU (refresh)
Vega 11 = 11CU desktop/mobile apu
Polaris10 = 36CU
Polaris11 = 16CU
Polaris12 = 10CU

So new Navi will look like this:
Navi10 = 80CU
Navi11= 11CU desktop/mobile apu
Navi12 = 40CU
Navi20= maybe exists but I think it will be a refresh/rebrand of Navi10.

It all makes perfect sense if Wccftech leak is true.
 
  • Like
Reactions: Hitman928

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106

Hitman928

Diamond Member
Apr 15, 2012
5,244
7,793
136
Yes, AMD told me this by releasing literally everything sans VEGA11.
Most of the boards in the pic were either already released or officially announced by the time this "leak" happened, so that's not saying much. Also, not everything was released, unless you can show me where to buy a Vega 10x2 which was supposed to release last year.

Lastly, look at the pictures in the roadmap. It is clearly someone just used paint or some other primitive photo editor to put new labels on pre-existing card pictures. I mean, look at Navi 10, you can still see the old Firepro label underneath "Navi 10". This was a fake, and a bad one at that.

c2SwU3t.png
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
  • Like
Reactions: CatMerc and Olikan

Hitman928

Diamond Member
Apr 15, 2012
5,244
7,793
136

Fair enough, hadn't seen that announcement. Still doesn't change my opinion of the road map. AMD has been doing dual GPU server / workstation boards for many generations now, wouldn't be that hard to predict.

That's called an "internal roadmap" and you surely don't even remotely know how this silly industry works.

I work in this silly industry, so yes, I do know how it works. I have friends who have worked for AMD, Nvidia, and intel. I've never seen an internal roadmap that looks like that. You can believe whatever you want, but I think to everyone else, it's obviously a fake.
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
If i remember correctly a jaguar core including l2 is 3.5mm2 on 28nm or aprox 28mm for 8 cores. A bit less than half of a zen2 cpu cluster.
Now I don't know if sony or ms would have tilted some mm2 from gpu to cpu but anyways 73mm2 for cpu seems to high a budget even if you have 350mm2 or so in total for a single soc.
Also if I remember correctly of those 8 jaguar cores 2 were dedicated to the system.
Using zen2 surely 1 core with it will cower that and then some. That means building with a 8c Zen2 cluster you can harvest for 7 cores. It will bring more economic sense into it.
Add a more mature process be it 7nmplus or 5nm it seems you can get into reach of this cpu used in a console.
I would just use the cpu2 as there is off shelf and add another io die specific for the consoles.
I feel the days of single soc for consoles is perhaps over.
The Lego age have begun :)
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Yes, AMD told me this by releasing literally everything sans VEGA11.
Ok. Here is where the confusion is so you can stop name calling.

AMD has internal code names for products Vega 10 was the large Vega die that made it into production.

In the past AMD tended to use numbers to separate big dies from little dies. But I haven't seen anything to lead me to believe that Vega 11 existed as a code name. Or outside Vega mobile (can't remember it's code name) that AMD ever planned a "small" Vega. It was always a Fury replacement and never a replacement for Polaris.

AMD did change up something with Vega. That is naming the shipping product by CU size. Vega 10 became Vega 64 and Vega 56. Vega 11 exists as on fully featured Raven Ridge chips as the GPU portion of the die.

There is a Vega 11. It isn't and never was what you think it was.
 
  • Like
Reactions: Glo.

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136
btw I was one of the earliest people to propose that the Rome IO die could have L4 cache. But I think the chances of that are slim to none. Firstly for a significant amount of L4 cache (say 256 MB) AMD needs to go with 14HP and eDRAM for L4 cache. I think that process is not suitable for low cost high volume designs. Rome IO die needs to be low cost and low complexity. So its most likely based on the mature GF 14LPP node. Moreover if you look at the Zeppelin die and move all the IO and memory controller circuitry to a single die you would end up quite close to the 420 sq mm die size. .

8 cores + 16MB L3 take 44mm2 according to AMD s stated density improvement, if the L3 is extended to 32MB it would require 52mm2 on the chiplet (and tHis would led to 256MB total L3s), dunno what are the remaining 20mm2 used for as this seems a lot for IF.

FTR 256 MB would require 256mm2 if implemented in the I/O device, and even using 12nm wouldnt shrink it further than 223mm2.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
8 cores + 16MB L3 take 44mm2 according to AMD s stated density improvement, if the L3 is extended to 32MB it would require 52mm2 on the chiplet (and tHis would led to 256MB total L3s), dunno what are the remaining 20mm2 used for as this seems a lot for IF.

Density improvement is different for logic and SRAM, and SRAM shrank a lot more than logic. Based on published numbers, a high-density SRAM bitcell should take 0.42x the space on TSMC 7nm of what one took on a GF 14nm process.