64 core EPYC Rome (Zen2)Architecture Overview?

Page 21 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

coercitiv

Diamond Member
Jan 24, 2014
7,354
17,423
136
Are you talking about 2 chiplets in total or just 2 CPU chiplets + I/O. If one already uses the I/O chiplet, then the Latency would not really get any worse with additional CPU chiplets.
I'm talking about 2 CPU chiplets + 1 I/O. Maybe we should kinda borrow from Intel and call it 2+1 to make it clear? Memory access would not have issues with more CPU chiplets, agree, but inter-CCX latency would be higher than intra-CCX. (assuming 1 CCX == 1 chiplet)

I see plenty of reason to release CPUS with both 1x8 and 2x8 core chiplets.

Halo products matter, even if they don't sell that many of them (why else does Intel struggle so hard with them). Just imagine the headlines if the mid/lower-range AM4 product had 8 cores, and the halo one had 16.
IMHO that halo product would find a better home on the enthusiast platform though. Remember Zen 2 doesn't only bring the potential to double the core count, it also brings wider AVX execution capabilities and likely higher sustained clocks, all of which will have quite the impact on power consumption. When you increase performance on so many fronts, a 50% power reduction from the node jump suddenly feels easy to spend.

If they want to pressure Intel on core count, they can do that starting with an even more price competitive Threadripper. In fact... they can start with TR 3 first, then follow with Zen 3000 series.

[EDIT] corrected inter vs. intra
 
Last edited:

hkultala2

Junior Member
Nov 8, 2018
6
16
51
Given that their APU's will arrive 6-9 months later also gives AMD time to design a second mask, and for TSMC to mature their node process to make it suitable for larger dies.

They are already making 338? mm² Vega 20 die

~150mm^2 ryzen die is not a problem for the TSMC "7nm" process in it's current state.

So they can easily also make ~150mm² 8-core desktop ryzen immediately, without any chiplets.
 

Gideon

Platinum Member
Nov 27, 2007
2,030
5,034
136
I'm talking about 2 CPU chiplets + 1 I/O. Maybe we should kinda borrow from Intel and call it 2+1 to make it clear? Memory access would not have issues with more CPU chiplets, agree, but intra-CCX would be higher than inter-CCX. (assuming 1 CCX == 1 chiplet)


IMHO that halo product would find a better home on the enthusiast platform though. Remember Zen 2 doesn't only bring the potential to double the core count, it also brings wider AVX execution capabilities and likely higher sustained clocks, all of which will have quite the impact on power consumption. When you increase performance on so many fronts, a 50% power reduction from the node jump suddenly feels easy to spend.

If they want to pressure Intel on core count, they can do that starting with an even more price competitive Threadripper. In fact... they can start with TR 3 first, then follow with Zen 3000 series.

Interesting thoughts overall.

But does the intra-CCX latency really need to be that much higher, if the I/O chip has an (inclusive?) L4 cache?

I still disagree somewhat with the enthusiast socket halo part a bit, as the platform costs are IMO too prohibitive for that to work (no intel HEDT part caught any real mindshare for instance). I would also be really surprised, if they did Threadripper first, considering that some TR2 parts just launched, while Ryzen out for half a year.

Just looking at it from the marketing spectrum: Pitting a 8-core Ryzen 3XXX against the i9900K (being at best about the same speed in most benchmarks I would assume) just doesn't have the same flair, as pitting even 12 cores against that, and firmly outpacing it in heavily MT tasks.
 

coercitiv

Diamond Member
Jan 24, 2014
7,354
17,423
136
Just looking at it from the marketing spectrum: Pitting a 8-core Ryzen 3XXX against the i9900K (being at best about the same speed in most benchmarks I would assume) just doesn't have the same flair, as pitting even 12 cores against that, and firmly outpacing it in heavily MT tasks.
Yeah, this is the part I continuously come back to and cannot completely counter: I must admit a higher than 8c/16t Ryzen 3000 launch would win a lot of benchmarks.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
I mean AMD did show a slide that clearly said Zen2 chiplet design. It didn't' say Rome chiplet design or anything else. Now that chiplets are confirmed AMD going monolithic on desktop Ryzen would then be even the bigger surprise.

However about using 2 chiplets I'm rather sceptical. I just don't see a large market for 16-core desktops. On the other hand it's just some die space wasted on the IO controller for all 8-core and lower chips. The could price a 16-core at $600 or said otherwise vs the 9900k and dominate it in multi-threaded. But the $600 desktop CPU market is rather tiny.
If another 7nm die is 300M and we know AMD quarterly profit is approx 150M you need pretty solid argument to make another die. Half a years profit. In the best quarters for years.
It's a huge risk if you can even find the cash and persons to do it.
Would you better use the same money on 7nm euv or 5nm? That's the relevant dilemma.
 
Last edited:

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Yeah, this is the part I continuously come back to and cannot completely counter: I must admit a higher than 8c/16t Ryzen 3000 launch would win a lot of benchmarks.

I expect Ryzen 7 3K, will be 8C/16T. But AMD could add Ryzen 9 3K, parts with 12,16 cores by adding a chiplet, even if they have a monolithic part for mainstream desktop. There would be more latency for the cores 9-16, but much like TR 2990X, it probably wont matter that much for workloads that can leverage 16C/32T, while keeping low latency for things like Gaming on the first 8 cores.

Or perhaps the build a small 4Core monolithic APU again, and can add an optional chiplet for 8-12 core models. Though I am hoping 8core monolithic APU.
 
  • Like
Reactions: Gideon

Spartak

Senior member
Jul 4, 2015
353
266
136
Having the MCM for 16-64 core threadripper / EPYC seems a given.

Personally I would love to see Ryzen move to a 2xCCX 6CPU+nGPU+IMC die for desktop 8-12 core processors with some serious built in graphics. Then the mobile / low power parts would use a single CCX with 4-6 cores.

This would be interesting if their 7nm CCX GPU could hit about half the 1060 performance. Two of those and you got an onboard GPU that would suit the needs of 80-90% of all gamers.

If you could get that in a sub 200W package high-end SFF gaming would literally explode. You'd need a massive heatsink but that would be it.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,155
5,542
136
Why not leave that for the enthusiast platform, where the 16 core can stretch it's legs with more mem bandwidth and power budget?

To me it just feels like using 2 chiplets is a waste, especially as it ends up hurting performance in latency bound consumer scenarios such as ... them games :)
That's the beauty. You only decide to make more of this product at the packaging stage. The individual chiplets have already been fabbed to be used in wide range products. No wasted inventory. Additional costs to produce this product is vanishingly small. I see the Threadripper creation thinking at work here if this becomes reality.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Why not leave that for the enthusiast platform, where the 16 core can stretch it's legs with more mem bandwidth and power budget?

To me it just feels like using 2 chiplets is a waste, especially as it ends up hurting performance in latency bound consumer scenarios such as ... them games :)
Because Threadripper 3000 series will start with 24 cores, and up to 32 cores.
 
  • Like
Reactions: lightmanek

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Hmmm, just had a thought - kinda off tangent from CPUs, but related to this - bear with me!


RV770 - more specifically the R700 or 4870x2 if you don't remember the codenames.

Big weakness was what....? Exactly, having to use crossfire on the one card. Which was weak because there were two separate memory controllers - and they had to use the alternate frame approach.

AMD could possibly offer a dGPU card which now incorporates multiple dies which share the same memory space via a very similar IO controller to what has been shown for EPYC and using the same IF links as already outlined for 7Vega.

Multiple dies on one card without needing crossfire? Sweet.

So you could have two small die (and therefore cheaper to design) variants (i) Compute heavy [Radeon Instinct] & (ii) Compute light [Radeon Graphics].

GPUs can be assembled from multiples of these dies in much the same way Rome will be assembled. A single die can go form an APU (with an 8C matisse chiplet and relevant IO controller).



910px-R700_interconnect.svg.png


DavidWang_NextHorizon_09_575px.jpg
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
That's the beauty. You only decide to make more of this product at the packaging stage. The individual chiplets have already been fabbed to be used in wide range products. No wasted inventory. Additional costs to produce this product is vanishingly small. I see the Threadripper creation thinking at work here if this becomes reality.

Exactly - it'd cost them relatively nothing (in time/money/manpower) to put it into the market and gauge response, then adjust production accordingly.
 

IRobot23

Senior member
Jul 3, 2017
601
183
76
I think zen or ryzen 3000 on AM4 will be single die 8C. Thats all. AM4 costumers don't need 16C, most of boards are not ready (mybe for low power 16C).
I don't know why people is talking about 2x 6C which doesn't make any sense. Just put 8C with higher clocks and low uncore latency and better support for 4000MHz+ ddr4 with insane effiency = my wallet is ready.

If you want complete workstation, think about x399. You even have ATX naples boards with 8 channels.
 
  • Like
Reactions: hkultala2

coercitiv

Diamond Member
Jan 24, 2014
7,354
17,423
136
I expect Ryzen 7 3K, will be 8C/16T. But AMD could add Ryzen 9 3K, parts with 12,16 cores by adding a chiplet, even if they have a monolithic part for mainstream desktop. There would be more latency for the cores 9-16, but much like TR 2990X, it probably wont matter that much for workloads that can leverage 16C/32T, while keeping low latency for things like Gaming on the first 8 cores.

Or perhaps the build a small 4Core monolithic APU again, and can add an optional chiplet for 8-12 core models. Though I am hoping 8core monolithic APU.
I don't know, I'm torn. If I had to pick the one piece of information that would lead me to believe an 8 core monolithic APU is in the works, that would be that old rumor that the 4.5Ghz Ryzen sample was being worked on within RTG labs.

On the other side the chiplet approach being used from top to bottom might end up being cost effective due to economies of scale. In fact, just as we play Lego on the forum, imagine AMD telling their semi-custom clients they can pick and choose between multiple configurations of CPU, GPU, HBM cache etc.

This is the work of the devil: it tempts people into thinking all kinds of outrageous tech stuff.
 

PotatoWithEarsOnSide

Senior member
Feb 23, 2017
664
701
106
I'm thinking that the AMD engineers read Anandtech and are cherry picking all of the wishlists that get posted here...and then saying "We can do that, sure."

Regarding latency, IF is at MEMCLOCK currently, which is 1600MHz best case currently. If that is decoupled and linked to clockspeed, we'd potentially be looking at 4GHz+, so even if going off die increases the number of cycles (or whatever it is referred to as), then unless we're talking big numbers then overall latency is likely to fall anyway.
Someone here must be able to do the math on how a 4GHz IF would reduce latency before and after any additional cycles. My brain is telling me that it'd need an increase of 2.5x the number of cycles to offset a 2.5x increase in IF speed. Then again, I'm an idiot and probably don't know what I'm talking a ut, so feel free to correct things.
Thanks in advance.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
I don't know why people is talking about 2x 6C which doesn't make any sense. Just put 8C with higher clocks and low uncore latency and better support for 4000MHz+ ddr4 with insane effiency = my wallet is ready.

Because 90% of the desktop enthusiast market needs high single core performance and low latencies. 16 core with low max frequency and large latencies doesn't make sense for gamers. The market isnt ready for 16 core mainstream. 12-8 core with high frequency and low latency? I think most gamers will vote with their wallets for the latter.

Remember the original Ryzen got slaughtered in reviews for its disappointing gaming performance even when it killed Core i7 in parallel workloads. AMD isnt going to repeat that mistake twice.

For mobile/APU the chance of a MCM is near zero given the bad trio of more power, more latency and lower clocks compared to a smaller core count CCX single die.

if the mobile part is already completely different, then why not build the APU up for desktop Ryzen instead of building the MCM down?
 
Last edited:

Veradun

Senior member
Jul 29, 2016
564
780
136
Halo products matter, even if they don't sell that many of them (why else does Intel struggle so hard with them). Just imagine the headlines if the mid/lower-range AM4 product had 8 cores, and the halo one had 16.

Halo products matter, true. Threadripper is there for that reason :)

I'm still on #teamSoC for AM4. 32t on consumer seems overkill, would have a low appeal since it would be creating contention both on TR and Ryzen.

What I'm expecting is a single 8c CCX in the usual SoC for AM4 :>
 
  • Like
Reactions: hkultala2

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Because 90% of the desktop enthusiast market needs high single core performance and low latencies. 16 core with low max frequency and large latencies doesn't make sense for gamers. The market isnt ready for 16 core mainstream. 12-8 core with high frequency and low latency? I think most gamers will vote with their wallets for the latter.

He means using a single chiplet design (1x8 cores) versus a dual chiplet design (2x4 for 8 cores and 2x6 for 12 cores). I happen to agree with him that 1x8 cores, at least initially, makes more sense to me on the consumer side but we'll see next year what AMD plans to do.
 

H T C

Senior member
Nov 7, 2018
610
451
136
Any word yet if AMD is thinking of doing a chiplet version of the GPU portion of APUs?

My thinking was instead of having chiplets on either side of the IO for the AM4 socket, they would have the IO on one side and an 8c / 16t CPU chiplet + a GPU chiplet on either the left or on the right side, to balance the chip, because having the GPU chiplet on one side and the CPU chiplet on the other would probably make latency higher due to the distance from CPU to GPU chiplets. The new Vega Instinct cards "already" have IF communication to the CPUs, @ least for Rome chips, right?

This would obviously cost more than the "non-GPU" version (AKA Zen + 2700(X) chips and their Zen 2 based equivalents) because having the GPU on-chip would enable AMD to price their offerings much higher than non-GPUed chips. It would also depend on the GPU portion capabilities but i'd expect @ least a bit better GPU performance then the current best APU has to offer.

Too farfetched?
 
  • Like
Reactions: Vattila

mattiasnyc

Senior member
Mar 30, 2017
356
337
136
The market isnt ready for 16 core mainstream. 12-8 core with high frequency and low latency? I think most gamers will vote with their wallets for the latter.

My guess is that remaining on a maximum 8 cores for now is the more reasonable option for AMD financially. Crank those clock speeds way up and beat the 9900K. Once Intel catches up it'd be "easy" to again offer more cores for that market segment if it's what they want. Or drop prices.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
He means using a single chiplet design (1x8 cores) versus a dual chiplet design (2x4 for 8 cores and 2x6 for 12 cores). I happen to agree with him that 1x8 cores, at least initially, makes more sense to me on the consumer side but we'll see next year what AMD plans to do.

My guess is that remaining on a maximum 8 cores for now is the more reasonable option for AMD financially. Crank those clock speeds way up and beat the 9900K. Once Intel catches up it'd be "easy" to again offer more cores for that market segment if it's what they want. Or drop prices.
8 core won't allow them to outperform Intel on the desktop and is too large / power consuming for mobile. So my bet is on a 6CCX but indeed we'll see once more rumors/news start trickle in.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
I'm thinking that the AMD engineers read Anandtech and are cherry picking all of the wishlists that get posted here...and then saying "We can do that, sure."

Really? Because I keep asking for a 4:2:2 decode accelerator for NLE scrubbing....
:>

we'd potentially be looking at 4GHz+

I think you're missing the increase in power that's likely to go hand-in-hand with an IF frequency boost.
https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review/4
IF is already 256bit wide too?
https://www.reddit.com/r/Amd/comments/5zr8lv/i_asked_amd_a_followup_question_about_infinity/

...and they're already pinging memory and L3 at the same time? Hmm....
 

Vattila

Senior member
Oct 22, 2004
820
1,456
136
AMD already will have 128 cores on 2 socket systems so do they really need to approach a niche market like 4S? I guess they could do very well in that too (atleast performance-wise) with 256 cores and 512 threads but I guess 1-2 socket servers are their biggest market.

Yeah. Although 4-socket is a smaller market, Lisa Su wants to play in high-performance compute. AMD is a participant in government-funded exa-scale research, with systems planned in the not too distant future. 4-socket capability would be another step up in compute density and allow them to compete better in the supercomputer realm.
 
  • Like
Reactions: Zapetu

IRobot23

Senior member
Jul 3, 2017
601
183
76
It's really hard to get bad yields on ~70mm^2 die that's mostly SRAM to boot.

Yeah, and AMD stated CCX has 8 cores.
Even if they go 1 die (uncore + 1x CCX) for AM4 platform that should be less than 150mm^2, something like ~148mm^2. That is actually very small amount.

If we know that zen CCX was 44mm^2 and we could say that ZEN2 on 7nm has ~36mm^2 (quad core with half of L3 - 8MB assuming). Way bigger core if we count in that 7nm die reduction.