64 core EPYC Rome （Zen2）Architecture Overview？

Zapetu · Nov 18, 2018

jpiniero said:
Sure looks like Real ASP's are way way higher on Instinct than even Epyc. That's why it was first.

I guess there is some money to be made there and even if they would just break even, there's valuable experience and market share to gain for a later date. Also Vega 20 is in many ways very similar to Vega 10 and it quite likely has been much much harder to get a novel design like Rome to work (and it still might not be fully working) than it has been with Vega 20. Yields might not be that good but maybe they are stockpiling those partially functioning Vega 20 dies for later use. Both Radeon Instinct MI60 and MI50 seem to be high end parts. I know that those are not good gaming GPUs but there might still be room for a lower spec MI40/MI30 or something similar.

Tuna-Fish said:
The biggest reason to go monolithic on desktop is DRAM latency -- since they are using lines on organic package, they will have to use SerDes, and that always costs a few cycles. For servers, the unified memory space is worth losing a little bit on latency, but on desktop, especially in games, it still hurts.

Gideon studied earlier how much worst latencies Intel Clarkdale had compared to some other CPUs of that time. It's hard to say how much impact moving Matisse's MCs off-die would have on performance but it doesn't sound like the best idea to do unless there are no other options. I guess you could always hide some latency by adding some cache on the smaller IO die and keeping chiplet's cache full 32MB for higher performance parts. Still monolithic die would be the safest bet but it might not be that viable of a solution.

Tuna-Fish said:
Even given that, I honestly think that it's quite unlikely that there would be 3 separate 7nm Zen2 dies. Aside from the chiplet for EPYC and TR, I think that an APU die is possible, but I honestly think it would be unlikely that there would be a separate one for >8 core Ryzen. Either entire non-HEDT market is served from the APU chip, or there is a separate IO chip that gets paired with 2 cpu chiplets.

Edit: While reading this, please keep in mind that as raghu78 pointed out a little later: AMD will only use TSMC N7 HPC ("7HPC") for all their 2019 CPUs and GPUs.

As we know, there are at least two version of TSMC's 7 nm process. Cheaper process for mobile SoCs and more expensive one for HPC. Let's call them 7FF and 7HPC as stated here. My question is that would AMD use solely 7HPC or would they use 7FF for lower end parts? Like this:

~~Picasso~~ (Edit: Renoir), ~~7FF~~ (Edit: 7HPC), low cost, low power, medium clocks
Matisse, 7HPC, higher cost, low enough/medium power, high clocks
Rome / TR3 chiplet, 7HPC, higher cost, power scalability, medium (good perf/W) or high clocks (absolute performance)
Rome / TR3 IO die, 14HP (???), higher cost, high amount of cache / buffers (eDRAM) for mitigating latency issues

There are other solutions than just having one quarter IO die and two chiplets but they all would mean only one chiplet and 8 cores max. But it would still be quite strange to add a lot of L4 cache to the IO die just to mitigate the latency of moving memory controllers (MCs) off-die (at least for desktop parts). It seems so much easier to just keep the IMCs (even if it costs some more). Server tasks are different and Naples already doesn't have that uniform latencies because of the MCM design. Making ~~Picasso~~ (or rather Renoir) 8 cores also doesn't sound that appealing but you never know. There might not a solution that can reasonable scale from low end APUs to much higher speed desktop CPUS but I'm not taking sides here.

msi2 said:
So, reading that, would it mean that a console APU has to be monolithic? Mainly for latency reasons?

If someone has an opinion whatever Sony/MS would prefer either cheaper ~~7FF~~ (Edit: no 7FF for 2019/early 2020 consoles either) or more expensive but better performing 7HPC, please let us know.

Also I have been thinking that maybe the next-gen consoles will have 6C CCX (or 2 x 3C CCX, although the latter might not be that good of a solution but still much better than 8 Jaguars cores). Then AMD could use that same 6C setup for their APU. Matisse could either use 3 x 4C CCX or 2 x 6C CCX if those were the options or it could be just 8 cores with a lot of L3 (and/or maybe some kind of chiplet design with smaller IOD as many seem to suggest).

Then there is Navi. Is AMD going to use 7HPC here or is ~~7FF~~ (Edit: no 7FF for Navi either) good enough? I'm guessing it depends on many things like yields and performance of 7FF. If 7FF is much cheaper or Samsung will have a cheaper alternative then maybe APUs and even Navi could be manufactured there. I'm guessing, that it won't be a huge chip. If anyone has any better knowledge or even some good guesses, let us know.

Also if AMD would use many different 7 nm processes, it might not be so easy to port different IP and designs between e.g. 7FF and 7HPC.

It's very limiting for consoles to use monolithic designs and you could always leave memory controllers on a separate die and connect GPU using a high speed link to that. The GPU could even have a one stack of HBM2 memory as a cache but that might still be a too expensive solution for console purposes. There are a lot of benefits going for MCM like designs but HBM would always require either a silicon interposer or an EMIB-like solutiion.

I'm guessing that Kaby Lake G has some (or most) of the elements we would like to see in future AMD APU (namely HBM2 memory and MCM design). Even Intel can't currently sell Kaby Lake G as a low end part , though.

jpiniero · Nov 18, 2018

The consoles will be 8 cores at least, if only for compatibility sake and making porting PS4 titles easy.

I had actually toyed with the idea of them coming with 2 GPU dies, although it would be comparable to CF and developers would have to work around that. Give them their own IO die with GDDR6 and make the CPU die and the GPU die(s) interface with the IO die.

Zapetu · Nov 18, 2018

jpiniero said:
The consoles will be 8 cores at least, if only for compatibility sake and making porting PS4 titles easy.

There would be 12 threads with 6 cores, though. And while it's not the same as full cores, Jaguar cores do have a really low performance compared to anything Zen based. But we'll se what they will come up with. They could have 8 cores but only clock some of them high and have e.g. two helper cores / reserved cores for OS usage. 8 high clocked cores might draw a lot of power and consoles have strict limitations on such things. But I'm not gonna argue about that but still 8 cores might be too much (for ~~Picasso~~ and even for Renoir) and 4 cores too few. Obviously you can always disable some of them but you would also lose some die space if only few of the sold CPUs would require 8 cores.

jpiniero said:
I had actually toyed with the idea of them coming with 2 GPU dies, although it would be comparable to CF and developers would have to work around that. Give them their own IO die with GDDR6 and make the CPU die and the GPU die(s) interface with the IO die.

I'm guessing that game developers don't want any quirky designs since PS3. But sure, almost any design has a lot of potential, it's just how easy it is to make use of it. HBM2 memories might be a no go for a while, though. It's probably always the best idea to put MCs on the same die as CPUs for the latency reasons. It would still require a very high bandwidth link between the CPU die and the GPU die which might be a problem. Developers want unified memories and easy programmability where PS4 was initially better than XB1.

JDG1980 · Nov 18, 2018

jpiniero said:
I had actually toyed with the idea of them coming with 2 GPU dies, although it would be comparable to CF and developers would have to work around that.

I'm wondering if the touted "scalability" of Navi is designed to allow 2 or 4 GPU chiplets to act as one big GPU by putting the memory controller and display I/O in the glue chip. We know that anything that requires developers to explicitly support mGPU is a nonstarter, but AMD is going to want to avoid the need for big, monolithic dice going forward due to yield and cost issues on smaller processes.

Yotsugi · Nov 18, 2018

JDG1980 said:
I'm wondering if the touted "scalability" of Navi is designed to allow 2 or 4 GPU chiplets to act as one big GPU by putting the memory controller and display I/O in the glue chip. We know that anything that requires developers to explicitly support mGPU is a nonstarter, but AMD is going to want to avoid the need for big, monolithic dice going forward due to yield and cost issues on smaller processes.

David Want confirmed that Navi is no MCM GPU.
Also scalability means just about everything in dGPUs.

raghu78 · Nov 19, 2018

Zapetu said:
I guess there is some money to be made there and even if they would just break even, there's valuable experience and market share to gain for a later date. Also Vega 20 is in many ways very similar to Vega 10 and it quite likely has been much much harder to get a novel design like Rome to work (and it still might not be fully working) than it has been with Vega 20. Yields might not be that good but maybe they are stockpiling those partially functioning Vega 20 dies for later use. Both Radeon Instinct MI60 and MI50 seem to be high end parts. I know that those are not good gaming GPUs but there might still be room for a lower spec MI40/MI30 or something similar.

Gideon studied earlier how much worst latencies Intel Clarkdale had compared to some other CPUs of that time. It's hard to say how much impact moving Matisse's MCs off-die would have on performance but it doesn't sound like the best idea to do unless there are no other options. I guess you could always hide some latency by adding some cache on the smaller IO die and keeping chiplet's cache full 32MB for higher performance parts. Still monolithic die would be the safest bet but it might not be that viable of a solution.

As we know, there are at least two version of TSMC's 7 nm process. Cheaper process for mobile SoCs and more expensive one for HPC. Let's call them 7FF and 7HPC as stated here. My question is that would AMD use solely 7HPC or would they use 7FF for lower end parts? Like this:

Picasso, 7FF, low cost, low power, medium clocks

Matisse, 7HPC, higher cost, low enough/medium power, high clocks

Rome / TR3 chiplet, 7HPC, higher cost, power scalability, medium (good perf/W) or high clocks (absolute performance)

Rome / TR3 IO die, 14HP (???), higher cost, high amount of cache / buffers (eDRAM) for mitigating latency issues

There are other solutions than just having one quarter IO die and two chiplets but they all would mean only one chiplet and 8 cores max. But it would still be quite strange to add a lot of L4 cache to the IO die just to mitigate the latency of moving memory controllers (MCs) off-die (at least for desktop parts). It seems so much easier to just keep the IMCs (even if it costs some more). Server tasks are different and Naples already doesn't have that uniform latencies because of the MCM design. Making Picasso 8 core also doesn't sound that appealing but you never know. There might not a solution that can reasonable scale from low end APUs to much higher speed desktop CPUS but I'm not taking sides here.

If someone has an opinion whatever Sony/MS would prefer either cheaper 7FF or more expensive but better performing 7HPC, please let us know.

Also I have been thinking that maybe the next-gen consoles will have 6C CCX (or 2 x 3C CCX, although the latter might not be that good of a solution but still much better than 8 Jaguars cores). Then AMD could use that same 6C setup for their APU. Matisse could either use 3 x 4C CCX or 2 x 6C CCX if those were the options or it could be just 8 cores with a lot of L3 (and/or maybe some kind of chiplet design with smaller IOD as many seem to suggest).

Then there is Navi. Is AMD going to use 7HPC here or is 7FF good enough? I'm guessing it depends on many things like yields and performance of 7FF. If 7FF is much cheaper or Samsung will have a cheaper alternative then maybe APUs and even Navi could be manufactured there. I'm guessing, that it won't be a huge chip. If anyone has any better knowledge or even some good guesses, let us know.

Also if AMD would use many different 7 nm processes, it might not be so easy to port different IP and designs between e.g. 7FF and 7HPC.

It's very limiting for consoles to use monolithic designs and you could always leave memory controllers on a separate die and connect GPU using a high speed link to that. The GPU could even have a one stack of HBM2 memory as a cache but that might still be a too expensive solution for console purposes. There are a lot of benefits going for MCM like designs but HBM would always require either a silicon interposer or an EMIB-like solutiion.

I'm guessing that Kaby Lake G has some (or most) of the elements we would like to see in future AMD APU (namely HBM2 memory and MCM design). Even Intel can't currently sell Kaby Lake G as a low end part , though.

Firstly you have not understood how AMD CPUs are designed. AMD's CPU physical design is targetted at achieving high clock frequencies. So TSMC N7 is not a choice at all. AMD Zen CPUs had a max turbo boost of 4 Ghz. Zen+ CPUs performance had a max single core turbo of 4.35 Ghz. The mobile Ryzen 2700u had a max turbo of 3.8 Ghz. AMD is likely to target max clock frequencies atleast on par or higher than Zen+ for their 7nm Zen 2. So the only option is N7 HPC. AMD has already confirmed that all their 7nm CPUs and GPUs are using N7 HPC. This was confimed by Ashraf Eassa of The Motley Fool on twitter. But Ashraf has deactivated his twitter account a couple of months back.

btw I was one of the earliest people to propose that the Rome IO die could have L4 cache. But I think the chances of that are slim to none. Firstly for a significant amount of L4 cache (say 256 MB) AMD needs to go with 14HP and eDRAM for L4 cache. I think that process is not suitable for low cost high volume designs. Rome IO die needs to be low cost and low complexity. So its most likely based on the mature GF 14LPP node. Moreover if you look at the Zeppelin die and move all the IO and memory controller circuitry to a single die you would end up quite close to the 420 sq mm die size. AMD has probably spent some die area to maintain some cache information about the data stored on the L3 of each chiplet so that a chiplet can quickly look up that info to see if some data is in the L3 of another chiplet. But thats about it.

AMD's 8 core chiplet die is the basic building block for all of its 7nm products from server CPUs, desktop CPUs, desktop/notebook APUs , next gen console APUs (PS5/XB2). BTW AMD's move to chiplets is not only for servers. I expect almost every AMD design at 7nm to incorporate chiplets. AMD's move is very logical as its easier to yield smaller dies and you can match chiplets with similar characteristics to build SKUs across the product stack. The modularity and reusability of chiplets dictates that 8 cores is the right choice. Here is how I see the 7nm designs from AMD

Rome - 8 x 8=64C, 8MC, 128 PCI-E 4.0 lanes
Threadripper - 8 x 8=64C, 8MC, 128 PCI-E 4.0 lanes
Ryzen - 2 x 8=16C, 2MC, 32 PCI-E 4.0 lanes
Ryzen APU - 1 x 8 = 8C + Navi GPU chiplet 20 CU + 4 GB HBM2 cache, 2MC, 32 PCI-E 4.0 lanes
PS5/XB2 - 1 x 8= 8C + Navi GPU chiplet 80 CU, 256 or 384 bit GDDR6.

Here is how I see AMD's Navi product stack

Ryzen 7nm APU - 20CU, 1280 sp.
Navi 12 - 40CU , 2560 sp, 128 bit GDDR6 or 256 bit GDDR5X.
Navi 10 (PS5 GPU) - 80CU, 5120 sp, 256 bit GDDR6.
Navi 20 - 120CU, 7680 sp, 384 bit GDDR6.

I think Navi will be a good architecture and address long standing problems and drawbacks with GCN like scalability, perf and area efficiency, perf per CU, perf per sp. In fact I am optimistic because Sony is very aggressive with their PS5 graphics performance goals and Navi is heavily influenced by PS5's perf targets and design goals.

HurleyBird · Nov 19, 2018

raghu78 said:
Moreover if you look at the Zeppelin die and move all the IO and memory controller circuitry to a single die you would end up quite close to the 420 sq mm die size. AMD has probably spent some die area to maintain some cache information about the data stored on the L3 of each chiplet so that a chiplet can quickly look up that info to see if some data is in the L3 of another chiplet. But thats about it.

Keep in mind that that there's a fair amount of redundancy there, but I think your assessment is probably most likely.

Veradun · Nov 19, 2018

Zapetu said:
Exactly. And with ~150mm² CCD there would not have been much of a point to make Matisse a chiplet design anyway,. I was only speculating that if both Matisse and Picasso would be monolithic designs then Rome could as well have been 16C CCD. And I'm not saying that it would have been a better design than 8C CCD but that it would have been one usable design choice.

My guess is they wanted to get rid of the CCX design and going to 16c would have added too many inteconnections. And then yelds and the abulity to assemble EPYC SKUs with 2,4 or 8 CCDs (possibly also 6).

jpiniero said:
The consoles will be 8 cores at least, if only for compatibility sake and making porting PS4 titles easy.

Since current consoles have 8 threads CPUs (8x Jaguar) you can have full thread compatibility with 4 Zen cores (since SMT makes them 8 threads)

beginner99 · Nov 19, 2018

jpiniero said:
The consoles will be 8 cores at least, if only for compatibility sake and making porting PS4 titles easy.

I have my doubts about 8-core zen consoles as already mentioned previously and/or in other threads. Current consoles use 8-core Jaguar at 1.3 ghz. They are dog slow. Even a 14nm zen1 4-core at 2.5 Ghz (like raven ridge) will run circles around it. Plus it has HT so you get your 8-threads for compatibility. 8-core zen even in 2021 would imho be too expensive for a console even more so on 7nm. So 14nm cpu + 7nm gpu for next gen consoles wouldn't surprise me at all.

Asterox · Nov 19, 2018

beginner99 said:
I have my doubts about 8-core zen consoles as already mentioned previously and/or in other threads. Current consoles use 8-core Jaguar at 1.3 ghz. They are dog slow. Even a 14nm zen1 4-core at 2.5 Ghz (like raven ridge) will run circles around it. Plus it has HT so you get your 8-threads for compatibility. 8-core zen even in 2021 would imho be too expensive for a console even more so on 7nm. So 14nm cpu + 7nm gpu for next gen consoles wouldn't surprise me at all.

Not even close, but as goes for PS5 yes 8 Core CPU is very logical choise.

Sony PS4, Jaguar CPU 1.6ghz

Sony PS4 Pro, Jaguar CPU 2.1ghz

Xbox One, Jaguar CPU 1.75ghz

Xbox One X, Jaguar CPU 2.3ghz

For PS5 7nm/Zen 2 CPU it does not have to be high clocked. 3ghz is more than enough CPU peformance, and that fit under 30W TDP for CPU only.

Jaguar CPU 2.3ghz vs 3ghz Zen 2, well that CPU performance diference is absurdly or like comparing mouse and elefant.

Zapetu · Nov 19, 2018

raghu78 said:
Firstly you have not understood how AMD CPUs are designed. AMD's CPU physical design is targetted at achieving high clock frequencies. So TSMC N7 is not a choice at all.

It's not so much that I don't understand that AMD CPUs are designed for high clock frequencies, I do understand that very well, but I'll admit that I don't have much knowledge on how much TSMC's N7 (7 nm SoC, 7FF) and N7 HPC (7 nm HPC, 7HPC) differ in terms of metal layers, clock speeds, cost per wafer, cost per unit, etc. I was under the impression that N7 was significantly denser than N7 HPC, correct me if I'm wrong. If manufacturing costs wouldn't be that much higher for N7 HPC than N7, then sure, AMD should only go with it.

And I was only suggesting N7 as an alternative monolithic design for ~~Picasso~~ ~~(successor of Raven Ridge)~~ Renoir (successor of Picasso) as it might make the die size much smaller. (Okay, I also mentioned Navi just to get some opinions on that) But it makes a lot of sense for AMD to just use N7 HPC for now as that also makes your IP and designs portable between different products.

If we're talking about differences between N7 (mobile phone SoCs) and N7 HPC (high performance computing) then here it says:

N7 HPC track provides 13% speed over N7 mobile (7.5T vs 6T), while it has passed the yield and qual tests (SRAM, FEOL, MEOL, BEOL) and MP-ready D0. Part of the contributing factor is TSMC successful leveraged learning from N10 D0 and it is targeted for Fab15.

There is still one problem, though. The capacity of 7 HPC when AMD, Nvidia and IBM (Power10) are all going to use that node and I'm sure there are many more. That might not be that big of a problem just in 2019 but in the future maybe.

So the only option is N7 HPC. AMD has already confirmed that all their 7nm CPUs and GPUs are using N7 HPC. This was confimed by Ashraf Eassa of The Motley Fool on twitter. But Ashraf has deactivated his twitter account a couple of months back.

Ian Cutress asked Mark Papermaster about Samsung:

IC: AMD has had a strong relationship with TSMC for many years which is only getting stronger with the next generation products on 7nm, however now you are more sensitive to TSMC’s ability to drive the next manufacturing generation. Will the move to smaller chiplets help overcome potential issues with larger or dies, or does this now open cooperation with Samsung given that the chip sizes are more along the lines of what they are used to?

MP: First off, the march for high performance has brought us to Zen 2 and the ability to leverage multiple technology nodes. What we’re showing with Rome is a solution with two foundries with two different technology nodes. It gives you an idea of the flexibility in our supply chain that we’ve built in, and gives you explicit example of how we can work with different partners to achieve a unified product goal. On the topic of Samsung, we know Samsung very well and have done work with them.

I guess it's all about flexibility and I don't think it would be wise for AMD to lock themselves just to use TSMC N7 HPC for foreseeable future (i'm also not saying that you're suggesting that either). All previous Zen-designs have used Samsung derived lithography process (namely 14LPP and 12LP) while Global Foundries have likely made some changes to them. Samsung currently have only 7LPE (low power early) listed for 2019 and that's not a process AMD wants to use. Still if Samsung will have some more performance oriented process coming then maybe AMD could use it (in late 2019/2020?).

You seem very well educated on this stuff and thank you for bringing it up. I'm not trying to argue with you at all about this. Based on Ashraf Eassa's tweet you brought up, I'm willing to believe that AMD will solely use N7 HPC in 2018/2019 for any 7 nm design. It will also make it much easier to share IP and designs between different products. So no N7 (SoC) for AMD CPUs and GPUs.

moinmoin · Nov 19, 2018

raghu78 said:
I think Navi will be a good architecture and address long standing problems and drawbacks with GCN like scalability, perf and area efficiency, perf per CU, perf per sp. In fact I am optimistic because Sony is very aggressive with their PS5 graphics performance goals and Navi is heavily influenced by PS5's perf targets and design goals.

This. In general I think AMD's semi custom business is underrated wrt its influence on AMD's other product designs. Especially AMD GPUs have been highly modularized design wise (fitting discussion in AT's current article) and a lot of them has been advanced through R&D for semi custom consumers like Sony and Microsoft. With the development on Zen AMD CPUs likely are becoming similarly modularized as well.

Zapetu · Nov 19, 2018

raghu78 said:
btw I was one of the earliest people to propose that the Rome IO die could have L4 cache. But I think the chances of that are slim to none. Firstly for a significant amount of L4 cache (say 256 MB) AMD needs to go with 14HP and eDRAM for L4 cache. I think that process is not suitable for low cost high volume designs. Rome IO die needs to be low cost and low complexity. So its most likely based on the mature GF 14LPP node. Moreover if you look at the Zeppelin die and move all the IO and memory controller circuitry to a single die you would end up quite close to the 420 sq mm die size.

Kokhua also originally predicted that there could be a large eDRAM L4 cache on IO die but now he has come to the same conclusion as you have:
https://twitter.com/chiakokhua/status/1062233351371681792

It would certainly be much easier from the manufacturing point of view to use 14LPP instead of IBM-based 14HP. I don't know how high volume POWER9 or z14 are and how much capacity there is left for AMD to use if they wanted to. EPYC has certainly brought AMD up as a serious player in the server market and Rome will only make the situation better. I wouldn't either be that surprised at all if the IO die would use 14LPP but there's still a slight change that it could use 14HP. There are benefits and drawbacks to both options.

raghu78 said:
AMD has probably spent some die area to maintain some cache information about the data stored on the L3 of each chiplet so that a chiplet can quickly look up that info to see if some data is in the L3 of another chiplet. But thats about it.

I'm no expert on cache coherency protocols but true, there are still many ways AMD can optimize memory access latencies other than just adding a large L4. Current Zen architecture might have some hints what they possibly could have done with Rome:

https://fuse.wikichip.org/news/1177/amds-zen-cpu-complex-cache-and-smu/2/

The L3 is filled with L2 victims. There are also special shadow tags found in each slice which duplicate the L2 state/tag entries for indexes in that slice. On an L2 miss or an external CCX probe in the case of multiple CCX configurations, those shadow tags are checked in parallel to the L3 checks in order to alleviate actual L2 bandwidth. The CCX itself was designed such that the L3 acts as a crossbar for each of the four cores.

So my best guess, as others have also pointed out earlier, would also be that IO die would have similar shadow tags for every chiplet's L3 caches (have CCMs on chiplets and corresponding CCM shadows on the IO die). They might also have shadow tags for all L2 caches on IO die if they haven't made L3 inclusive of L2 cache. The worst case scenario would be that every single memory access would require snooping for every chiplet's L3 caches (through SerDes / IF-links) and AMD surely wants avoid that as much as possible. There are quite likely many other (smaller) tricks and buffering options they can do to further hide latencies even without a large L4.

Their main weapon to combat latencies would be the large 32MB L3 cache for each 8 core chiplet. While there is till no confirmation of the amount of L3 cache other than what Canard PC reported, it would make a lot of sense.

Yotsugi · Nov 19, 2018

raghu78 said:
Ryzen 7nm APU - 20CU, 1280 sp.

That's not how it works.
More SPs won't do anything without more memory b/w.

raghu78 said:
Navi 10 (PS5 GPU) - 80CU, 5120 sp, 256 bit GDDR6.

That's now how it works either, unless you're into 500mm^2 console SoC's.

jpiniero · Nov 19, 2018

Bondrewd said:
That's now how it works either, unless you're into 500mm^2 console SoC's.

It is possible, if it is multi chiplet and not monolithic. I would say it's more likely to be decently less tho.

Yotsugi · Nov 19, 2018

jpiniero said:
It is possible, if it is multi chiplet and not monolithic. I would say it's more likely to be decently less tho.

3 dies and 2 memory pools is less than ideal for a console.
Also I clearly said SoC, not SiP anyway.

Zapetu · Nov 19, 2018

Take this one with a pinch of salt but rumors about Picasso are getting stronger and if they're correct, Picasso would feature the same setup as Raven Ridge and just uses 12LP for minor improvements in power efficiency. There would still only be 4 cores (8 threads) (Zen+) and 11 CUs (Vega).

https://videocardz.com/79070/amd-ryzen-7-3700u-spotted-features-picasso-gpu
https://www.tomshardware.com/news/amd-ryzen-7-3700u-picasso-apu-specs,38075.html

If that's indeed true then Picasso would be just a 12LP version of Raven Ridge, just like Pinnacle Ridge and Polaris 30 were 12LP versions of the previous 14LPP ones. Sure, this 14LPP to 12LP transition would quite likely be very cheap and keeps the WSA rolling. So possibly no Zen2 APUs for a while (maybe 2019Q4 or 2020?).

There's also this leaked road map and only Matisse should clearly be a Zen2 part.

Source: https://informaticacero.com/amd-zen-2-llegara-2019-nombre-matisse-se-apoyara-aun-socket-am4/

I realized that I was writing about Picasso as an 7 nm product when clearly I should have speculated on the next-next one Renoir. I have fixed most of my errors in previous posts' though.

New Version of Raven Ridge sure buys them time to develop their chiplets-APU designs.

beginner99 · Nov 20, 2018

Zapetu said:
If that's indeed true then Picasso would be just a 12LP version of Raven Ridge

Isn't this very old news? Like half a year or even more?

Topweasel · Nov 20, 2018

beginner99 said:
Isn't this very old news? Like half a year or even more?

Yeah we knew that probably around the launch of Threadripper last year.

Zapetu · Nov 20, 2018

beginner99 said:
Isn't this very old news? Like half a year or even more?

Well, it's not exactly news even now but rather some leaks and rumors. But right, that part in the spoiler tags (AMD leaked road map) was from over a year ago 25-Sep-2017. The leaked road map did state the following (and these are not yet 100% confirmed by AMD either):

Matisse

Zen 2 cores

Socket AM4

Picasso

Raven Ridge architecture

Power/Performance uplift

Socket AM4 desktop

Socket FP5 notebook

That UserBenchmark leak was about 4 months ago, if I'm not mistaken, so half a year is about right. There were still some rumors/speculation (just recently) that Picasso might be Zen 2 based but quite likely, it doesn't seem to be the case.

12LP UserBenchmark rumour (23-Jul-2018):
https://www.notebookcheck.net/Upcoming-12-nm-AMD-Picasso-APU-spotted-online.317706.0.html

Posted this one already:
Speculation of Zen 2 / 7 nm (15-Nov-2018):
https://videocardz.com/79070/amd-ryzen-7-3700u-spotted-features-picasso-gpu

I hope AMD will finally release this one but sure, it doesn't seem to be anything too exciting with Zen(+) and basically just the same deal as Pinnacle Ridge.

If 12LP turns out to be true, there's one thing I don't like, though, and it's this:

14LPP Summit Ridge -> 12LP Pinnaccle Ridge -> N7 HPC (7 nm) Matisse
14LPP Polaris 10 -> 12LP Polaris 30
14LPP Raven Ridge -> 12LP Picasso

Picasso should have been called <Something> Ridge to signify that it would be just a 12LP refresh. I hope that AMD will release this one soon, so all the speculation will end.

And sure many on you "knew" about this already but I just mixed them up few days ago and should have talked about Renoir related to 7 nm monolithic or chiplet based APU. My apologies on that. Just wanted to correct my mistake.

Still there's lot to speculate about Matisse (2019, Zen2 CPU), Renoir (2020?, Zen2 or Zen3 APU) and Vermeer (2020?, Zen2 or Zen3 CPU).

Edit: Removed some pointless speculation. Anyway, Castle Peak will most likely be leftover Rome IO dies (14 nm) and 8C chiplets (7 nm). Only Raven Ridge (Ryzen APU) is still 14LPP instead of 12LP. Had some brain farts here for a while...

Atari2600 · Nov 20, 2018

https://www.servethehome.com/xilinx-alveo-u280-launched-possibly-with-amd-epyc-ccix-support/

So it'd appear that the Rome IOC has CCIX support.

Which surely makes packing up an APU using three separate die for

chiplet <--> IOC <--> GPU

all on one package the obvious choice. With the IOC obviously being quite restricted in size/functionality compared to Rome.

Zapetu · Nov 20, 2018

Atari2600 said:
https://www.servethehome.com/xilinx-alveo-u280-launched-possibly-with-amd-epyc-ccix-support/

So it'd appear that the Rome IOC has CCIX support.

Nice find. From the article you linked:

This slip is a big deal. If AMD indeed is enabling CCIX support, it will have a coherent interconnect that can bring accelerators such as the Xilinx Alveo U280 onboard using something other than vanilla PCIe Gen4.

If AMD and Xilinx announce CCIX support in 2019, that is going to kickstart the entire server ecosystem and Intel will be forced to respond.

This (CCIX in 2019) should be a big win for AMD's server offerings if it turns out to be true. AMD is really focused on their server side and hopefully with all these new technologies (64 cores (x86-64, HPC) , PCIe 4.0, and now maybe CCIX, etc.) they will succeed and gain a substantial amount of market share. We need more competition in all market segments but since we all know that the server market is a very profitable one, that matters the most for now.

From the AnandTech interview:

IC: Where does Rome sit with CCIX support?

MP: We didn’t announce specifically those attributes beyond PCIe 4.0 today, but I can say we are a member of CCIX as we are with Gen Z. Any further detail there you will have to wait until launch. Any specific details about the speeds, feeds, protocols, are coming in 2019.

ub4ty · Nov 20, 2018

Anyone know when AMD epyc (rome) will be shipping in 2019 and subsequently Zen 2 based on this new architecture? Need to decide whether to grab another 1950x on sale or wait it out.

jpiniero · Nov 20, 2018

ub4ty said:
Anyone know when AMD epyc (rome) will be shipping in 2019 and subsequently Zen 2 based on this new architecture? Need to decide whether to grab another 1950x on sale or wait it out.

Looks like Epyc (for non-Cloud customers) isn't until Q2; and any consumer parts would be 3 months after that.

ub4ty · Nov 20, 2018

jpiniero said:
Looks like Epyc (for non-Cloud customers) isn't until Q2; and any consumer parts would be 3 months after that.

Almost got swept up in the sales...
I think i'll wait until Rome and the new architecture to make my next purchase.
I've been leaning towards consolidation rather than expansion. I'd love to reduce my current systems down to one massively contained solution. So, Q2 2019 and later it is. Hopefully ram and peripheral prices will plummet as well. These thanksgiving sales are pretty good but I'm already loaded down with hw.

64 core EPYC Rome （Zen2）Architecture Overview？

Member

Lifer

Member

Golden Member

Golden Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Member

Diamond Member

Member

Golden Member

Lifer

Golden Member

Member

Diamond Member

Diamond Member

Member

Golden Member

Member

Senior member

Lifer

Senior member