64 core EPYC Rome (Zen2)Architecture Overview?

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

kokhua

Member
Sep 27, 2018
86
47
91
Are you claiming that the only desktop die is an APU class product and that all desktop products will derive from this?

Implied in this is that there will be no 7nm desktop products until late next year, because if I'm not mistaken, there will be no 7nm APU refresh until that timeframe.

TBH, I haven’t given this deep enough thought, not “claiming” anything. I’m just saying if I were charged to assemble a product line-up against what Intel has to offer in 2019, that’s what I might do. But I am not AMD. I do believe that a 7nm Zen2 8C/16T APU can serve the mainstream desktop segment quite well against Intel’s i3/i5/i7. Timeframe-wise, ROME is supposed to be mid-2019, and Ryzen follows probaby in Q3.
 
  • Like
Reactions: scannall

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
Are you claiming that the only desktop die is an APU class product and that all desktop products will derive from this?

Implied in this is that there will be no 7nm desktop products until late next year, because if I'm not mistaken, there will be no 7nm APU refresh until that timeframe.

Pretty sure the 2019 refresh will be 12nm not 7nm. Mainly because the 7nm won't be Vega it will either be Navi or the completely new arch after Navi.
 

jpiniero

Lifer
Oct 1, 2010
16,799
7,249
136
I think AMD would love it if they could just develop an Navi chiplet (even if monolithic) and just slip it in there. How it would get fed is of course the problem.Maybe you could get away with the latency on GPUs as long as the CPU chiplet had dual channel. Maybe they could even throw in an optional GDDR6 controller that mobile could use.
 

rainy

Senior member
Jul 17, 2013
522
453
136
Timeframe-wise, ROME is supposed to be mid-2019, and Ryzen follows probaby in Q3.

I think, that Rome is Q1 product and Matisse (Ryzen 3xxx) most probably Q2 simarily as it was with Pinnacle Ridge and to the some degree with Summit Ridge (Ryzen 5).
 

dnavas

Senior member
Feb 25, 2017
355
190
116
As a rule of thumb, you need 2.5GB/s per core. You could push 12C but that would be really stretching it to the extreme.

3200RAM is something like 48GBps, no? Of course, if you meant 2.5GB/s per thread, then yeah that's a problem.... But otherwise, just require faster RAM for the higher end SKUs. I actually think we're due for 12C on the desktop as well, and would be very disappointed if it's still at 12nm. And if we're still working with 128bit vector units on the next gen TR, there are going to be words....
 

HurleyBird

Platinum Member
Apr 22, 2003
2,811
1,544
136
As a rule of thumb, you need 2.5GB/s per core. You could push 12C but that would be really stretching it to the extreme.

Or, you can increase the size of the cache and/or optimise its behaviour. This could be less expensive than it sounds, since the two L3 blocks in Zen 1 are largely specific to one CCX or the other. A unified pool would increase the effective amount of cache substantially.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Pretty sure the 2019 refresh will be 12nm not 7nm. Mainly because the 7nm won't be Vega it will either be Navi or the completely new arch after Navi.
There are TWO mobile paltforms AMD is working on. Raven Ridge 2, and Picasso. Raven Ridge 2 already has been released, Its Ryzen 5 2600H and R7 2800H. Picasso is actually something different than that if we are to believe the leaked drivers.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
I wouldn't be offended one bit even if you did. As long as you have actually thought about the problem and are willing to share with me why you believe it is BS.
Well what you have drawn is actually the best thing we have to imagine, and think about what Rome actually can be. You may be spot on with it, or missed the point completely. We will know in upcoming few months.
 
  • Like
Reactions: kokhua

DrMrLordX

Lifer
Apr 27, 2000
22,901
12,966
136
2990WX was an outlier in that a memory controller is being disabled, but Athlon 200GE joined that approach.

Touche', I didn't remember the 200GE in all this.

My biggest question mark: Are CPUs for Rome and Matisse separate design? That is the only thing worth considering right now.

Honestly, I hope not. One of the factors that would keep Matisse on-schedule for Q2 2019 launch would be AMD using the same dice for Matisse as they do Rome. Rome is already on-target for launch, and with the GF shutdown of 7nm, one of the major factors that keeps Matisse on-target as well is the fact that the same-die strategy would make it a lot easier to move Matisse production to TSMC (given that Rome is already being produced there). Having to tape out an entirely new die at TSMC would bring about delays.
 

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
There are TWO mobile paltforms AMD is working on. Raven Ridge 2, and Picasso. Raven Ridge 2 already has been released, Its Ryzen 5 2600H and R7 2800H. Picasso is actually something different than that if we are to believe the leaked drivers.

AMD themselves call Picasso a Raven Ridge arch perf uplift, like they did with Pinnacle Ridge. I think the confusion is RR'18 which is basically RR with the process getting better and tighter binning with more options and not a different product.
 

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
Touche', I didn't remember the 200GE in all this.

Honestly, I hope not. One of the factors that would keep Matisse on-schedule for Q2 2019 launch would be AMD using the same dice for Matisse as they do Rome. Rome is already on-target for launch, and with the GF shutdown of 7nm, one of the major factors that keeps Matisse on-target as well is the fact that the same-die strategy would make it a lot easier to move Matisse production to TSMC (given that Rome is already being produced there). Having to tape out an entirely new die at TSMC would bring about delays.

Exactly. I am 90% sure AMD was doing Rome Dies at TSMC and Matisse at GF, mostly because they wanted the flexibility of meeting the WSA while using the better process that was going to be on time for Rome. Using TSMC for Rome also meant that they could easier ramp up and down production compared to GF where they always had a bottom end they had to hit. Moving it all to TSMC would mean nothing then besides asking for more chips from them. If it was anything else AMD would have had a Presser about how sad they were with the decision and how they will be working with other partners to limit any damage this might have on their roadmap. They didn't. Therefore Matrisse and Rome use the same dies.
 

Mopetar

Diamond Member
Jan 31, 2011
8,487
7,726
136
TBH, I haven’t given this deep enough thought, not “claiming” anything. I’m just saying if I were charged to assemble a product line-up against what Intel has to offer in 2019, that’s what I might do. But I am not AMD. I do believe that a 7nm Zen2 8C/16T APU can serve the mainstream desktop segment quite well against Intel’s i3/i5/i7. Timeframe-wise, ROME is supposed to be mid-2019, and Ryzen follows probaby in Q3.

Do you think they need an 8C APU at all though? That almost seems excessive for the market segment it’s designed to target.

AMD is going to have the better gaming APU which is what I think is important in that segment. They’re better off devoting space to the GPU or just making a smaller chip overall to pad their margins. The added benefit of an extra four cores doesn’t feel meaningful.

I don’t expect AMD to do this, but what I think could be interesting for an APU, especially one designed for mobile is the big.LITTLE approach we see in ARM SoCs. We know AMD has some okay low power designs, but those may be too old for such an endeavor. The market probably isn’t there for it, but it might be something to consider.

I do appreciate your insight though. This thread has been a great read.
 

kokhua

Member
Sep 27, 2018
86
47
91
3200RAM is something like 48GBps, no? Of course, if you meant 2.5GB/s per thread, then yeah that's a problem.... But otherwise, just require faster RAM for the higher end SKUs. I actually think we're due for 12C on the desktop as well, and would be very disappointed if it's still at 12nm.

I meant 2.5GB/s per core, but you also need to factor in memory utilization efficiency which is typically ~60%. So for example, 2ch DDR4-3200 gives 3200 x 64 x 2 x 60% / 8 = ~30.72 GB/s, for 30.72 / 2.5 = ~12 cores max.

And if we're still working with 128bit vector units on the next gen TR, there are going to be words....

Not sure AVX512 is useful for mainstream desktop CPUs; burns too much power. In any case, if AMD chooses to implement it, it is relatively straightforward.

Or, you can increase the size of the cache and/or optimise its behaviour. This could be less expensive than it sounds, since the two L3 blocks in Zen 1 are largely specific to one CCX or the other. A unified pool would increase the effective amount of cache substantially.

Yes, increasing cache size will improve utilization of available memory bandwidth. In my diagram, I depicted an 8-core CCX partly because I felt a larger L3 cache unified across 8 cores will improve performance. But a case can be made for keeping with dual 4-core CCX's as well. For ROME, 8ch DDR4-2667 is actually not sufficient to feed 64C (2667 x 64 x 8 x 0.6 / 8 / 2500 = ~41). That's why I added memory compression and a large L4 cache (wishful thinking!). Collecting all 8 memory controllers together does offer tremendous flexibility in memory controller architecture and could improve memory utilization efficiency to the point where compression, and especially L4, may not be necessary.
 

kokhua

Member
Sep 27, 2018
86
47
91
One of the factors that would keep Matisse on-schedule for Q2 2019 launch would be AMD using the same dice for Matisse as they do Rome. Rome is already on-target for launch, and with the GF shutdown of 7nm, one of the major factors that keeps Matisse on-target as well is the fact that the same-die strategy would make it a lot easier to move Matisse production to TSMC (given that Rome is already being produced there). Having to tape out an entirely new die at TSMC would bring about delays.

For all we know, AMD could have planned separate designs for EPYC and Ryzen right from the start. One of the key benefits of Zen's lego-like architecture is that it allows AMD to implement multiple variants of the same basic architecture faster, cheaper, and much more easily.
 

kokhua

Member
Sep 27, 2018
86
47
91
Do you think they need an 8C APU at all though? That almost seems excessive for the market segment it’s designed to target.

Yes, I do.

All of Intel's i3/i5/i7 and even the flagship i9-9900K feature iGPU. I think a 7nm Zen2 based 8C/16T APU will kick Intel's ass for mainstream desktops with a CPU that is at least on par for single-thread performance and far superior in graphics.
 

kokhua

Member
Sep 27, 2018
86
47
91
AMD themselves call Picasso a Raven Ridge arch perf uplift, like they did with Pinnacle Ridge. I think the confusion is RR'18 which is basically RR with the process getting better and tighter binning with more options and not a different product.

Given the extremely close working relationship, AMD must have known about GloFo's decision to quit 7nm long ago. They could have burnt all their old roadmaps. I secretly _hope_ that Picasso will be a 7nm Zen2 8C/16T APU.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
I meant 2.5GB/s per core, but you also need to factor in memory utilization efficiency which is typically ~60%. So for example, 2ch DDR4-3200 gives 3200 x 64 x 2 x 60% / 8 = ~30.72 GB/s, for 30.72 / 2.5 = ~12 cores max.

Ah! So working that backwards X = 2.5 * 8 * 16 / (64 * 2 * .6) = 4166Mbps That would, indeed, be an issue :>

Not sure AVX512 is useful for mainstream desktop CPUs; burns too much power. In any case, if AMD chooses to implement it, it is relatively straightforward.

TR -- not mainstream. And I'd not complain too loudly with half width AVX512. Probably :> Don't the Xeons have a dedicated additional unit?

Obvious question, would TR be built from the high-power R7-III design, or the lower(?) power Epyc II?
 

kokhua

Member
Sep 27, 2018
86
47
91
So again, this is what I will do for 2019 product line-up if I were AMD:

Datacenter: EPYC 2
Architecture: 4/6/8 CPU dies + 1 SC die for 32/48/64C SKU's. Maybe keep EPYC 1 (NAPLES) around for 8/16C SKU's.
Against Intel Xeons: clear win

Workstation: ThreadRipper 3
Architecture: same as EPYC2 but use 2/4 CPU dies + 1 SC die for 16/32C SKU's
Against Intel Core X series: clear win, 16/32C with 4 DDR4 channels without TR1's trade-offs

Mainstream Desktop and Notebooks: Ryzen APU
Architecture: monolithic 8C/16T Zen2 with beefy GPU, fuse off features for power and segmentation
Against Intel i3/i5/i7/i9: equal on CPU, beat on graphics

All the above with just 3 unique die designs. Can add another smaller 4C/8T APU later if volume justifies.

I actually believe AMD is ready to launch EPYC 2 in Q1'19 but is holding back so as not to disrupt the production ramp-up of NAPLES by the hyperscalers. I wish everything could be pulled in by 1 quarter though. AMD has a once-in-a-lifetime opportunity to gain some serious market share from top-to-bottom.
 
Last edited:
  • Like
Reactions: DownTheSky

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Mainstream Desktop and Notebooks: Ryzen APU
Architecture: monolithic 8C/16T Zen2 with beefy GPU, fuse off features for power and segmentation
Against Intel i3/i5/i7/i9: equal on CPU, beat on graphics
I think this general idea is a waste of silicon :(

I think there is plethora of people who would rather have 4C/8T APU with 28 CUs with HBM2 on the package, rather than 8C/16T with iGPU which has not even 30% of graphics performance of this design.


I cannot remember where have I read, but there was rumor that PS5 silicon will NOT be monolithic, whatever that means.

IMO. IF AMD designed very small dies with Navi and Zen2, but with scalability and chiplets in mind(!), we may not see a monolithic APU anymore, my friends. It will save them a lot of costs designing hardware.

IMO, the most probable scenario is that Zen is completely modular design, that AMD can reuse in a lot of ways, on a lot of platforms.
 

kokhua

Member
Sep 27, 2018
86
47
91
I think this general idea is a waste of silicon :(

I think there is plethora of people who would rather have 4C/8T APU with 28 CUs with HBM2 on the package, rather than 8C/16T with iGPU which has not even 30% of graphics performance of this design.

I cannot remember where have I read, but there was rumor that PS5 silicon will NOT be monolithic, whatever that means.

IMO. IF AMD designed very small dies with Navi and Zen2, but with scalability and chiplets in mind(!), we may not see a monolithic APU anymore, my friends. It will save them a lot of costs designing hardware.

IMO, the most probable scenario is that Zen is completely modular design, that AMD can reuse in a lot of ways, on a lot of platforms.

We're talking about mainstream users here. Intel has >60% share of the GPU market (counting iGPUs) not for no reason. Not every PC user is a hard core gamer. A slightly beefier iGPU would suffice for many casual gamers, I suspect. Otherwise, with the threat from Ryzen, Intel would have ejected the iGPU in favor of more cores.

Seems AMD really sold the idea of modularity and chiplets very well. I've said earlier, it makes complete sense when you are comparing against large dies, especially those approaching reticle limit like Xeons; otherwise a monolithic design still wins handily in both performance and total cost. We'll see if you are right in a few months.
 

CluelessOne

Member
Jun 19, 2015
76
49
91
If this speculation is true, then what should AMD called it?
Rome, the Northbridge strikes back?
Or, Rome, the Return of Northbridge?
Add HBM inside then we get sideport memory back.

Sorry, couldn't help myself.
 
  • Like
Reactions: lightmanek

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
I think this general idea is a waste of silicon :(

I think there is plethora of people who would rather have 4C/8T APU with 28 CUs with HBM2 on the package, rather than 8C/16T with iGPU which has not even 30% of graphics performance of this design.

28 CUs with HBM2 is a niche pipe dream. You still need an affordable APU for the majority of the market.
 

kokhua

Member
Sep 27, 2018
86
47
91
If this speculation is true, then what should AMD called it?
Rome, the Northbridge strikes back?
Or, Rome, the Return of Northbridge?
Add HBM inside then we get sideport memory back.

Sorry, couldn't help myself.

Seems like taking a step backwards, doesn't it? Perhaps that's why people far smarter than me dismissed it as rubbish initially. Peronally, I think it is quite neat and elegant, not just because *I* came up with it. It's not entirely original anyway; I merely deduced it from the rumors which I consider very credible.

I am still waiting for someone to offer an alternative architecture that explains why AMD would move to 9 dies instead of just staying with 4 like in Naples.
 

maddie

Diamond Member
Jul 18, 2010
5,155
5,542
136
I think this general idea is a waste of silicon :(

I think there is plethora of people who would rather have 4C/8T APU with 28 CUs with HBM2 on the package, rather than 8C/16T with iGPU which has not even 30% of graphics performance of this design.


I cannot remember where have I read, but there was rumor that PS5 silicon will NOT be monolithic, whatever that means.

IMO. IF AMD designed very small dies with Navi and Zen2, but with scalability and chiplets in mind(!), we may not see a monolithic APU anymore, my friends. It will save them a lot of costs designing hardware.

IMO, the most probable scenario is that Zen is completely modular design, that AMD can reuse in a lot of ways, on a lot of platforms.
The big unknown is the cost of reassembling smaller die into a larger unit. I showed earlier an old costing from the MD of Applied Materials on interposer production costs. The other relevant info is the reassembly costs. In this case, and unlike HBM, there are not thousands of Vias to be made, any thinning of die, no stacking and a much lower microbump density. This will be much less costly to implement than HBM modules.