64 core EPYC Rome (Zen2)Architecture Overview?

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136

AMD is also working on some kind of Memory compression for CPUs, or at least they were thinking about it since they have a Patent Application for it.

Systems, apparatuses, and methods for compression of frequent data values across narrow links are disclosed. In one embodiment, a system includes a processor, a link interface unit, and a communication link. The link interface unit is configured to receive a data stream for transmission over the communication link, wherein the data stream is generated by the processor. The link interface unit determines if blocks of data of a first size from the data stream match one or more first data patterns and the link interface unit determines if blocks of data of a second size from the data stream match one or more second data patterns. The link interface unit sends, over the communication link, only blocks of data which do not match the first or second data patterns.

http://pdfaiw.uspto.gov/.aiw?PageNum=0&docid=20180167082&IDKey=A6AD66F9D401&HomeUrl=http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1%26Sect2=HITOFF%26d=PG01%26p=1%26u=%252Fnetahtml%252FPTO%252Fsrchnum.html%26r=1%26f=G%26l=50%26s1=%252220180167082%2522.PGNR.%26OS=DN/20180167082%26RS=DN/20180167082

v8x7Zih.png
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
If AMD wants to win Apple computers, especially laptops they have to design a CPU that has 8C/16T CPU, and any small iGPU, OR - a custom APU, that is power efficient enough, just for Apple.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I think if you go with a central chipset to connect to all the dies, you are probably going to forgo interconnects between the individual cpu dies and handle it all through the central chipset.

No matter how it is accomplished, moving to 64 cores will increase connectivity complexity, whether it is buried inside the CPU chip dies, Controller chip, or package interconnects, or some combination of them.
That's my take as well.

If the rumor so far are correct (64c, 8+1 chiplets, 256MB L3 cache) the interconnections may not be necessary by design and kokhua's diagram may actually be not far off.

Zen 1 was the decentralized design, every die is self sufficient. To enable multiple die scaling respective I/O provision (serdes) need to be included on the die and a connection between each die needs to exist.

Zen 2 may be going the centralized way. The core dies contain the bare minimum uncore to connect the cores with the rest (equal or less than the AM4 platform capabilities depending on whether the same or different dies serve consumer and server markets). All multiple die scaling is done through the SC die that hosts all additional I/O. With Zen 1 interconnections between all dies was a requirement to enable read access to the "same" L3 cache even though it's splintered across all the different CCXes on different dies. With Zen 2 the L3 cache on the core dies is duplicated on the SC die (thus the doubling of L3 cache to 256MB in the rumor, kokhua called it L4 cache in his diagram) which means to read another die's core's L3 cache a core die only needs the one direct connection to the SC die with its L3 cache clone, further interconnections are no longer necessary. This keeps interconnection complexity between dies down while scaling, and the one direct connection between core die and SC die can be optimized for latency.

Downsides is the need for 128MB of duplicate L3 cache and a huge monolith uncore die that is likely even more of a power hog than the uncore before.
 
Mar 11, 2004
23,030
5,495
146
If AMD wants to win Apple computers, especially laptops they have to design a CPU that has 8C/16T CPU, and any small iGPU, OR - a custom APU, that is power efficient enough, just for Apple.

I personally don't think AMD is likely to win those deals. Intel is really entrenched, and can add features that AMD can't as easily (LTE/5G modem). I think that is part of the reason Intel is developing their own GPU too (and pretty clearly would be planning on slotting their own into packaging like that one they did with AMD). I also think Apple is working on their own SoC (which won't happen really soon, but it means that Apple isn't interested too much in rocking the boat, and slapping AMD in there, while it shouldn't be a huge problem, still would require a pretty healthy amount of tweaking - and I believe Intel still has an edge in that power level - largely because Apple has tailored so much software tweaks to take full advantage of Intel's power management setup).

Which, maybe 7nm will give them enough advantage that they could win that, as Zen 2 with Navi should be a solid performer and good fit for the Macbook Pros. And it could be 2-3 years before Intel has a GPU product to slap in there, and if their 10nm keeps being problematic, then it would give AMD an edge. Something else that shouldn't be ignored, AMD and Apple would be on the same process, so Apple could integrate their own stuff into the AMD stuff fairly easily. I'm still doubtful, but it presents an interesting opportunity/situation.

I feel if AMD is going to win some of Apple's CPU deals, its probably going to be in the Mac Pro stuff, where they can put Threadripper (or maybe even EPYC; possibly a custom chip as well, like maybe they make a large interposer and slap GPU on there with CPU with a bunch of HBM between and then they give you the option of how many you want).
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Let's discuss what makes sense to put on the System Controller chiplet and not.

There is the obvious stuff that previously was replicated unnecessarily in the chiplets, i.e. the south-bridge functionality and the security processor.

Then there is the speculative stuff that only needs one instance and makes sense in the socket, such as socket interconnect and coherency logic (e.g. directory-based cache coherency to limit traffic and improve multi-socket scaling).

Then there is the interesting stuff that needs to scale with core-count and thus may make more sense to include on the CPU chiplets, i.e. memory controllers and L4 cache. The OP makes a case for putting these on the SC chiplet, to make memory latency more uniform, as well as exploit opportunities for optimisation enabled by aggregating the controllers.

Here are some possible counter-arguments:
  • Memory controller power consumption — 7nm controllers lower power.
  • Memory latency — move closer to the CCX (win for typical VM/partition sizes).
  • L4 cache power consumption and size — 7nm cache can be larger and lower power.
  • L4 cache latency — move closer to the CCXs by putting it in the CPU chiplets.
  • Heat — distribute the heat from controllers and L4 to the CPU chiplets.
What say you?
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
I am still skeptical of whether l4 will be useful at all, unless Rome will be running particularly slow DDR4. Unless we're dealing with particularly fast eDRAM.
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,625
136
I am still skeptical of whether l4 will be useful at all, unless Rome will be running particularly slow DDR4. Unless we're dealing with particularly fast eDRAM.
Useful compatibility with 1st gen motherboards & memory?

If one of your selling points was a drop in replacement for Naples, then you would need some way to mitigate the drastic fall in memory bandwidth/core if going from 32 > 64 cores on the same system. I can see a new line of motherboards with doubled memory slots. For this thread, 8 chiplets, each with dual channels = 16 channels total. Use all for new Rome motherboards and 8 for Naples boards.
 
  • Like
Reactions: Vattila

kokhua

Member
Sep 27, 2018
86
47
91
I am still skeptical of whether l4 will be useful at all, unless Rome will be running particularly slow DDR4. Unless we're dealing with particularly fast eDRAM.

L4 cache will useful in mitigating DRAM latency and bandwidth. But to be effective, it needs to be fairly large, something like 8MB/core or 512MB. At 14nm, the SC die size will be too large, so this is probably not going to happen. Moving to 8-core CCX and increasing the L3 cache on the CPU die to 32MB total is much more likely, imo. If SC die moves to 7nm in Milan, then maybe.
 

kokhua

Member
Sep 27, 2018
86
47
91
Useful compatibility with 1st gen motherboards & memory?

If one of your selling points was a drop in replacement for Naples, then you would need some way to mitigate the drastic fall in memory bandwidth/core if going from 32 > 64 cores on the same system. I can see a new line of motherboards with doubled memory slots. For this thread, 8 chiplets, each with dual channels = 16 channels total. Use all for new Rome motherboards and 8 for Naples boards.

One way is to use memory compression as I depicted in my diagram. 16 channels is not practical, imo. The pin-count will be too high and it will be very challenging to lay out the motherboard. Though a bit further out, the solution is DDR5.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I think we will soon learn how it is going to be in couple of weeks

Lisa Su on Zen 2 / Rome


Our architecture is socket compatible between the first and second generation. We're sampling it now

When we look at our 7-nanometer product and its positioning in 2019 across the server landscape, we feel very good about the positioning. I think it's not just 7-nanometer, 7-nanometer is important, but we've also made some significant changes to the architecture as well as how – sort of the system. So, I think, overall, we feel with the design and process capabilities, that our 7-nanometer products will be quite competitive.

Yes, and we will go through this in a lot more detail on November 6. But at a high level, I think 7-nanometer gives us better density. So, for a given system, we can put more cores on it. It gives us better power, so that gives us total cost of ownership. And it does give us better performance as well.

I think Devinder's comment was that our 7-nanometer GPU would ship here in the fourth quarter. We're on track to launch that here shortly. The second generation of EPYC, our 7-nanometer CPU, will ship in 2019. We are broadly sampling it now. I think from what we see, the performance is very competitive. And also many of our customers have had a chance to really spend time with the first generation of EPYC, get to learn our architecture and do much of the platform bring-up.


So, we're excited about what the second generation of EPYC can do for us. We're going to talk a little bit more about that in a couple weeks at our datacenter event, but we believe that our competitive position gets stronger as we get into 2019.

TL;DR;
Significant changes in architecture from Zen 1, better density, better performance overall competitive vs strong upcoming parts from the competition in 2019. More details coming up on Nov 6.

WSA with Glofo
Yes. So, look, GLOBALFOUNDRIES is a good partner. They continue to be a very important partner for us. We are in discussions with them about how to update our agreement post their strategy updates, and we're making good progress on that. So, we'll give you more detail as we get closer, but overall, GLOBALFOUNDRIES continues to be an important partner for us.

https://seekingalpha.com/article/42...-results-earnings-call-transcript?part=single
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
Not sure that Lisa Su is to give precise infos in two weeks, although those numbers are already known by the competion as it makes no doubt that some OEM(s) evaluting the plateform did leak the thing.

Recent statements from Lenovo and HP to switch to AMD are an indication that it s not Epyc that is the focus as they wouldnt had say so if they werent sure that the follow on is to be significantly better, no one in his right mind would advise a one time product just to be forced to get back to Intel in the two years.
 
  • Like
Reactions: lightmanek

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
I think we will soon learn how it is going to be in couple of weeks

Lisa Su on Zen 2 / Rome













TL;DR;
Significant changes in architecture from Zen 1, better density, better performance overall competitive vs strong upcoming parts from the competition in 2019. More details coming up on Nov 6.

WSA with Glofo


https://seekingalpha.com/article/42...-results-earnings-call-transcript?part=single
I notice she mention more cores and power before performance.
Makes a lot of sense to make zen2 server oriented. They used first gen to mature the platform now comes the lean stuff.
Better to expect 48 tuned cores at 180w tdp than a 8c 220w desktop part :)
Don't think they have a hurry to get the desktop versions to market. It doesn't take a rocket scientist to expect zen2 to be extremely competitive at many server loads and that will take most of the dies. I guess that's what is reflected in the stock price as it certainly isn't the net profit and earnings a per share.
 

Glo.

Diamond Member
Apr 25, 2015
5,657
4,409
136
What do you guys think about possibility that AM4 products will end up on 16C/32T CPU, and start with 8C/8T Ryzen 3, one? ;)
 
  • Like
Reactions: lightmanek

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
I notice she mention more cores and power before performance.

Perfs improvement clock/clock and competitive landscape can be more or less infered from the few available infos for both AMD and Intel.

If BC s 13% is accurate, and taking account of past evolution (FP gaining more than INT) then we should expect something like 10/16% in INT/FP respectively.

The competition on the other side is expected to bring 5/8% in INT/FP with ICL, and only in 2020.
That s assuming that it include the IPC loss due to implementation of their version of SEM/SEV, wich their CPUs are currently lacking, while AMD has already taken the 3.5% (on average) performance hit due to the secured domains accesses.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
I thought we'd agreed on each Zen2 chip having 8 cores, 1 memory channel. The AM4-cpu-package will thus max out at two chips + IO, while TR4 will have four chips + IO.

2 channels. Gives them more flexibility and additional options to yield.
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,625
136
I thought we'd agreed on each Zen2 chip having 8 cores, 1 memory channel. The AM4-cpu-package will thus max out at two chips + IO, while TR4 will have four chips + IO.
Having 1 memory channel feeding 8 cores is a recipe for disaster, especially when the IPC and frequency simultaneously increases. Whoever promoted that idea, is very misguided.
 

mattiasnyc

Senior member
Mar 30, 2017
356
337
136
Having 1 memory channel feeding 8 cores is a recipe for disaster, especially when the IPC and frequency simultaneously increases. Whoever promoted that idea, is very misguided.

Well, first of all, it seems to me that AMD's strategy here paid off well. They managed to set their current business/architecture up to manufacture essentially one type of die and simply bin it accordingly into several different market segments. It seems very efficient to me and from what I can tell the market seems to agree. They're doing well.

Secondly, is it really correct that increased IPC and clock frequency really makes this one channel a bigger issue? Seems to me that if there's a bottleneck it'd be cores/channel, not speed or IPC relative to channel count. In other words; are we sure that future AMD CPUs if they use one channel per die will actually be more bottlenecked by that single channel than they already (?) are?

Lastly, is it even possible for AMD to create the new Zen chips with for example 2 memory channels per die while simultaneously have them be socket-compatible backwards so that the new chips next year will fit and work in AM4/TR4 and SP4?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
Well, first of all, it seems to me that AMD's strategy here paid off well. They managed to set their current business/architecture up to manufacture essentially one type of die and simply bin it accordingly into several different market segments. It seems very efficient to me and from what I can tell the market seems to agree. They're doing well.

Secondly, is it really correct that increased IPC and clock frequency really makes this one channel a bigger issue? Seems to me that if there's a bottleneck it'd be cores/channel, not speed or IPC relative to channel count. In other words; are we sure that future AMD CPUs if they use one channel per die will actually be more bottlenecked by that single channel than they already (?) are?

Lastly, is it even possible for AMD to create the new Zen chips with for example 2 memory channels per die while simultaneously have them be socket-compatible backwards so that the new chips next year will fit and work in AM4/TR4 and SP4?
Think 2990wx. I have 4 channels supporting 32 cores. Does pretty well. Clocked at 3700. This will be the same density as 64 cores on 8 channels. And essentially same socket.
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,625
136
Well, first of all, it seems to me that AMD's strategy here paid off well. They managed to set their current business/architecture up to manufacture essentially one type of die and simply bin it accordingly into several different market segments. It seems very efficient to me and from what I can tell the market seems to agree. They're doing well.

Secondly, is it really correct that increased IPC and clock frequency really makes this one channel a bigger issue? Seems to me that if there's a bottleneck it'd be cores/channel, not speed or IPC relative to channel count. In other words; are we sure that future AMD CPUs if they use one channel per die will actually be more bottlenecked by that single channel than they already (?) are?

Lastly, is it even possible for AMD to create the new Zen chips with for example 2 memory channels per die while simultaneously have them be socket-compatible backwards so that the new chips next year will fit and work in AM4/TR4 and SP4?
Very confused by this post.

At present, we have AM4 with 2 channels for an 8 core Zen die. Do you see an increased IPC and frequency 8 core Zen2 CPU being unaffected by dropping to 1 memory channel? This is to be a general purpose CPU, not just for compute bound programs.