Speculation: Ryzen 3000 series

Shivansps · Dec 13, 2018

somethingclever said:
I can guarantee and will wager a significant amount of money that each ccx is still 4 cores unless they changed the topology to a ring bus or mesh network (they probably haven't because it doesn't scale well). That is why a 6 core CPU with 2 dies is impossible unless its possible to disable an entire ccx.

Then 6C are not going to be happening unless it is an APU.

somethingclever · Dec 13, 2018

Shivansps said:
Then 6C are not going to be happening unless it is an APU.

It can still happen, just with one chiplet + one IO die. I quote myself:

somethingclever said:
...
With that in mind, and assuming the rumors are "mostly correct", I can see how the lineup would work
Ryzen 3 with 6C/12T is 1 chiplet + an IO die
Ryzen 3G with 6C/12T is 1 cpu chiplet + 1 gpu chiplet + an IO die
Ryzen 5 and 5G is the above but with a fully functioning chiplet of 8C/16T
Ryzen 7 and 9 use two chiplets with 12C/24T and 16C/32T respectively and an IO die.

Zapetu · Dec 13, 2018

somethingclever said:
Ah I see your point now, and we seem to be violently arguing in agreement with each other

Violently arguing may be a little bit of an exaggeration but it's always a good thing to have many points of view.

I have to say that AdoredTV is deep down a very genuine guy and he will always try to correct his mistakes in his latest videos and takes constructive criticism seriously. He does his research very well and thoroughly (most of the time) and uses a lot of time for each video. That doesn't always pan out too well as he discussed in one of his videos. Even as a youtuber, he didn't release that "Ryzen 3000, Radeon 3000 Series LEAKS" video just to get views but because he really believed in it and he's really an advocate of the chiplet design. I'm hoping that we will get more information soon and I'm sure he's really trying to get to the bottom of this. There's still no reason for him to stress too much over this but that video did really get viral (as far as tech videos are concerned).

Edit: In retrospective even if he made some really bold assumptions that didn't all turn out to be true, he still brings a lot of interesting ideas to speculate on but not to be taken as absolute facts. His chiplet videos espeacially for EPYC Rome were very intriguing.

dnavas · Dec 13, 2018

Zapetu said:
What about 6C models then? Would they have just one DDR4 IMC or would they have two dies each? [...]

That's a very bold assumption for a 72mm² chiplet to have 8 cores, 32MB of L3 cache, 1xDDR4 PHY, 12 PCIe, 2xUSB, etc.

I had basically the same response, so although I didn't make the original post on it, I'll defend the idea (mostly because I was coming here to do so :>). I expect each Ryzen chiplet will be *larger* (slightly) than an Epyc with 16 pcie4 and 1xDDR4 PHY.

You'll note that the 6 core chips are ONLY in Ryzen3. There was a bunch of static regarding "but why have SMT enabled on your low-end, how will you differentiate?" Well, you don't do that with cores and threads -- the whole thrust for AMD is to go whole hog into cores. Instead you differentiate in I/O. That's the same as between TR and AM4 and between server parts and TR. The IGP gets another DDR4 PHY on the Navi die, so an IGP setup is one PHY for CPU, one PHY for GPU. You won't have a lot of pcie in such a setup, but it's an IGP setup, what do you need a lot of pcie *for*?

16 pcie4 is exactly the same bandwidth as we have now (slightly more, actually, if it all gets exposed) -- I assume some of those channels will head towards the external motherboard chip, and that the new one will be able to feed out extra 3.0 lanes from the 4.0 inputs. Of course, if you stick a chip like this into an existing board, odd things will happen, but why would you buy a low-end R3 part to put into an existing board? That doesn't make a lot of sense. Spring for a 3600 instead if you want to upgrade.

The rest of the lineup has full 2xDDR4 and 2x16 pcie4 setup as they all have two chiplets each. That's why the 8-core setup is 4 + 4.

But, what about the memory access across two distinct PHYs you ask? The thing is, how is an I/O die going to make anything better? It's still one hop from CPU to the I/O die to read and write memory. So, yes, it'll be less consistent, but we're going to a setup where the average latency is half a hop, and latency on AM4 is exactly what half of y'all have been complaining about when we talked about I/O dies in the first place! That's also without knowing what they've done to IF....

And now let's consider TR3. Will it be an Epyc platform with four epyc cores and one I/O die, or an I/O-less one with four Ryzen dies? Four Ryzen dies under this setup would be four DDR4 PHY with 64 pcie lanes, which is what TR expects. It's a choice that would forego 64 core thread-rippers, but it would allow the epyc server dies to be optimized for low power, and the ryzen/tr dies to be optimized for high frequency. On the other hand, as it stands right now, the Ryzen and Epyc dies only require one IF link each, which has got to be a nice savings in area and power. The latter chip links to the I/O chip and the former links to another Ryzen (or IGP). Unfortunately, for a TR setup, you'd need a lot more than that. It'll be interesting to see what they do here.

Of course, that all assumes that this is what a Ryzen chiplet looks like. It's been an on-going argument whether Epyc and Ryzen will or won't share the same layout, and I would say going I/O-less would be an argument for the "won't" side....

maddie · Dec 13, 2018

Despoiler said:
I don't know about Adored's latest video. I don't see why AMD would go monolithic with consumer and chiplets for enterprise. Granted the monolithic might just be for pure speed, but an IF link to an IO die should be faster than current Ryzen chips. It solves a bunch of issues. Chiplets make high investments in node technology pay off sooner. Adored mentions packaging and shipping for a reason, but the chips are already being made in one place and packaged in another. Container transport via ship is cheap. It's around $5000 for a 40ft as a one off. You can put A LOT of chips in a 40ft container. I just can't see why AMD wouldn't go for the most cost effective/profit generating choice. If anything it's something that no one understands yet.

I found that shipping costs point rather dumb. Intel has fabs even more widespread than AMD and they mix and match very well. Are there any one stop fabs anywhere in the world?

Hitman928 · Dec 13, 2018

maddie said:
I found that shipping costs point rather dumb. Intel has fabs even more widespread than AMD and they mix and match very well. Are there any one stop fabs anywhere in the world?

Lots of fabs have assembly teams "on site" and offer packaging, but the packaging they offer is more "off the shelf" type of options with a relatively tiny amount of leads/pins. For these types of chips, I have no insight.

Mopetar · Dec 13, 2018

maddie said:
I found that shipping costs point rather dumb. Intel has fabs even more widespread than AMD and they mix and match very well. Are there any one stop fabs anywhere in the world?

It does make a certain amount of sense for the low cost parts. With Epyc or Threadripper, the prices are high enough that the additional logistics costs don't amount to much, but really cuts into the margins on anything at the extreme low end. It's not just shipping costs either, but extra assembly costs as well.

However, it's not as though TSMC doesn't have an older, less expensive node that can be used to manufacture IO dies, so just because GF isn't doing it doesn't mean it can't exist. If they're going to use chiplets for the consoles, it makes a certain amount of sense for TSMC to fab all of the parts so that they can be shipped together. The IO die for Ryzen would be much smaller as well, and since AMD changed the WSA with GF to be on a per-wafer basis, it won't hurt them too much.

Zapetu · Dec 13, 2018

dnavas said:
I expect each Ryzen chiplet will be *larger* (slightly) than an Epyc with 16 pcie4 and 1xDDR4 PHY.

I read your post and it's certainly possible to do but would still require a new 7nm die (as you said) likely just used for desktop products. Also it's a ccNUMA design but you are right that about half of the time you would use the local memory controller. And if you optimize for it, you could get some possible performance gains. Still it seems a rather peculiar design but it might work, it's hard to predict. Sure the two "half dies" would be the same size and have equal amounts of IO. You would still have to duplicate some IO things, though, but that's not too much of a die space.

I'm still a fan of the pure chiplet design (because of yields, binning and total flexibility) and if that's not the case for Ryzen 3k then it's interesting to see what AMD has done. And your idea is an interesting departure from the general way of thinking.

maddie said:
I found that shipping costs point rather dumb. Intel has fabs even more widespread than AMD and they mix and match very well. Are there any one stop fabs anywhere in the world?

I'm sure TSMC would be a so called one stop fab at least for some products. We are not shipping bulldozers all over the world here so I wouldn't be too concerned about logistics costs even for low cost SKUs. I'm sure you can send your avarage dies in the same shipment as your best quality silicon at least if you're shipping them overseas in a ship container.

Mopetar said:
However, it's not as though TSMC doesn't have an older, less expensive node that can be used to manufacture IO dies, so just because GF isn't doing it doesn't mean it can't exist. If they're going to use chiplets for the consoles, it makes a certain amount of sense for TSMC to fab all of the parts so that they can be shipped together. The IO die for Ryzen would be much smaller as well, and since AMD changed the WSA with GF to be on a per-wafer basis, it won't hurt them too much.

WSA will hurt AMD though if they can't fulfill their wafer target:
https://www.anandtech.com/show/10631/amd-amends-globalfoundries-wafer-supply-agreement-through-2020

AMD could end up paying GlobalFoundries two different types of penalties: one for making a chip at another fab, and a second penalty if AMD doesn’t make their wafer target for the year with GlobalFoundries.

I read somewhere that penalty per wafer isn't really that high but the penalty for not meeting the wafer target is much higher. If someone can verify or share more light on the subject that would be great. Because of that it would make more sense to manufacture most (in not all) 14nm/12nm parts on GloFo.

somethingclever · Dec 13, 2018

Zapetu said:
That wasn't the point. If you would get a Rome CPU on your hands with no previous knowledge what it contains, you could guess that the IO die is using a different process than the smaller dies but just by looking at it, you have no way to know for sure. You could make educated guesses based on size but that's about it.

If AdoredTV's source has seen what Ryzen 3000 with 3 small dies looks like he/she could have assumed that all of them are so small that they must be 7nm. Still the IO die could be 14nm even if it's about the same size as the other chiplets. There's room for a lot of misinterpretation for these leaks where someone saw something. Even if one of his sources says that Ryzen 3000 will be all 7nm that's not a fact and there still could be a 14nm small IO die. AdoredTV should also get a second source for his Ryzen 3000 will only be 7nm rumor.

If there is no (14nm) IO die for desktop then I'm still assuming what i did previously that it's a monolithic 8C die with one IF-link to connect to one Rome chiplet (12C and 16C models only).

I watched his most recent videos on ryzen and navi matter reading your comment and I agree with you. It made me realize I've been basically coming to the same conclusions he had based on the leaks with similar technical reasoning. I don't really care much about marketing or pricing so I can't argue anything on that front...

Anyways another thought I've had is I would believe the likelihood of AMD having multiple unique IO dies on 14nm is a lot higher than on 7nm because the entire point of the IO die is to use a more mature and cheaper process. But if adoredTV's source is reliable when he says he only sees one 14nm die at GF then that's out the window.

But, it doesn't discount the possibility of using a quartered Rome IO die if the quartering and packaging is done at a different facility. It even fits with your theory of someone mistaking it for a 7nm die because visually they should be about the same size, depending on how much dead silicon there is between quarters for chopping the chip.

Edit: also I think the argument over shipping cost is ridiculous because as mentioned earlier in this thread, it's cheap. And B even if it isn't negligible, if AMD is packaging all the chips in the same facility no matter what segment, then the cost will be so split up that it does become negligible or worth eating up for the lowest margin products. I'm not a business student but breaking even or a slight loss while profiting in market share sounds like a fair deal even if not ideal.

Edit 2: Could ryzen io die be on GF 12nm?? Purely out my ass with no argument as to why it makes sense

Zapetu · Dec 13, 2018

somethingclever said:
Anyways another thought I've had is I would believe the likelihood of AMD having multiple unique IO dies on 14nm is a lot higher than on 7nm because the entire point of the IO die is to use a more mature and cheaper process. But if adoredTV's source is reliable when he says he only sees one 14nm die at GF then that's out the window.

That's exactly right. It would cost much less for them to differentiate their products using different 14nm IO dies (i.e. main dies) than having many different 7nm designs. Also 7nm would be best utilized having large caches and tightly packed cores (execution units) rather than more IO that doesn't scale so well to smaller nodes.

I think that some of AdoredTVs sources are not that certain about what they are talking about but they sure have seen some things. It's easy to spot a very large IO die that's almost the same size as Vega 10. A small IO die doesn't really look in no way that special and GloFo do have other customers than just AMD so there are different kinds of wafers being produced. Sure, AMD is their main customer and they would have to make a lot of those IO dies for Ryzen 3k launch. Still, I'm waiting for a new information and I'm not taking AdoredTV's videos as the truth and his sources might be and has been wrong in the past. Still his 8+1 leak for Rome turned out to be true.

somethingclever said:
But, it doesn't discount the possibility of using a quartered Rome IO die if the quartering and packaging is done at a different facility. It even fits with your theory of someone mistaking it for a 7nm die because visually they should be about the same size, depending on how much dead silicon there is between quarters for chopping the chip.

Wafer cutting might also be done at a different facility (300mm wafers would be shipped as whole) and I'm not really that aware of the IC production chain. We should really wait for new information since now everything seems rather confusing.

maddie · Dec 13, 2018

dnavas said:
I had basically the same response, so although I didn't make the original post on it, I'll defend the idea (mostly because I was coming here to do so :>). I expect each Ryzen chiplet will be *larger* (slightly) than an Epyc with 16 pcie4 and 1xDDR4 PHY.

You'll note that the 6 core chips are ONLY in Ryzen3. There was a bunch of static regarding "but why have SMT enabled on your low-end, how will you differentiate?" Well, you don't do that with cores and threads -- the whole thrust for AMD is to go whole hog into cores. Instead you differentiate in I/O. That's the same as between TR and AM4 and between server parts and TR. The IGP gets another DDR4 PHY on the Navi die, so an IGP setup is one PHY for CPU, one PHY for GPU. You won't have a lot of pcie in such a setup, but it's an IGP setup, what do you need a lot of pcie *for*?

16 pcie4 is exactly the same bandwidth as we have now (slightly more, actually, if it all gets exposed) -- I assume some of those channels will head towards the external motherboard chip, and that the new one will be able to feed out extra 3.0 lanes from the 4.0 inputs. Of course, if you stick a chip like this into an existing board, odd things will happen, but why would you buy a low-end R3 part to put into an existing board? That doesn't make a lot of sense. Spring for a 3600 instead if you want to upgrade.

The rest of the lineup has full 2xDDR4 and 2x16 pcie4 setup as they all have two chiplets each. That's why the 8-core setup is 4 + 4.

But, what about the memory access across two distinct PHYs you ask? The thing is, how is an I/O die going to make anything better? It's still one hop from CPU to the I/O die to read and write memory. So, yes, it'll be less consistent, but we're going to a setup where the average latency is half a hop, and latency on AM4 is exactly what half of y'all have been complaining about when we talked about I/O dies in the first place! That's also without knowing what they've done to IF....

And now let's consider TR3. Will it be an Epyc platform with four epyc cores and one I/O die, or an I/O-less one with four Ryzen dies? Four Ryzen dies under this setup would be four DDR4 PHY with 64 pcie lanes, which is what TR expects. It's a choice that would forego 64 core thread-rippers, but it would allow the epyc server dies to be optimized for low power, and the ryzen/tr dies to be optimized for high frequency. On the other hand, as it stands right now, the Ryzen and Epyc dies only require one IF link each, which has got to be a nice savings in area and power. The latter chip links to the I/O chip and the former links to another Ryzen (or IGP). Unfortunately, for a TR setup, you'd need a lot more than that. It'll be interesting to see what they do here.

Of course, that all assumes that this is what a Ryzen chiplet looks like. It's been an on-going argument whether Epyc and Ryzen will or won't share the same layout, and I would say going I/O-less would be an argument for the "won't" side....

If the chiplet was to have PCIe lanes and a DDR4 controller, then what is the IO in IO die?

Zapetu · Dec 13, 2018

maddie said:
If the chiplet was to have PCIe lanes and a DDR4 controller, then what is the IO in IO die?

As I undestood it, there is no IO die, just two equal sized CPU dies containing half of the IO both. There's only one IF-link between them so that's one IF hop. And that's basically all it is. 6C model would only have one (single channel) DDR4 controller while other models would have two. I'm making an illustration of that design to better understand it.

maddie · Dec 13, 2018

somethingclever said:
I watched his most recent videos on ryzen and navi matter reading your comment and I agree with you. It made me realize I've been basically coming to the same conclusions he had based on the leaks with similar technical reasoning. I don't really care much about marketing or pricing so I can't argue anything on that front...

Anyways another thought I've had is I would believe the likelihood of AMD having multiple unique IO dies on 14nm is a lot higher than on 7nm because the entire point of the IO die is to use a more mature and cheaper process. But if adoredTV's source is reliable when he says he only sees one 14nm die at GF then that's out the window.

But, it doesn't discount the possibility of using a quartered Rome IO die if the quartering and packaging is done at a different facility. It even fits with your theory of someone mistaking it for a 7nm die because visually they should be about the same size, depending on how much dead silicon there is between quarters for chopping the chip.

Edit: also I think the argument over shipping cost is ridiculous because as mentioned earlier in this thread, it's cheap. And B even if it isn't negligible, if AMD is packaging all the chips in the same facility no matter what segment, then the cost will be so split up that it does become negligible or worth eating up for the lowest margin products. I'm not a business student but breaking even or a slight loss while profiting in market share sounds like a fair deal even if not ideal.

Edit 2: Could ryzen io die be on GF 12nm?? Purely out my ass with no argument as to why it makes sense

But if adoredTV's source is reliable when he says he only sees one 14nm die at GF then that's out the window.

Not necessarily. I assume that Rome has started production with Ryzen 3xxx to follow by several months. GloFlo might simply not have started fabbing the IO die for Ryzen yet. If the source is on the production area, they might not know what's coming.

Shivansps · Dec 13, 2018

maddie said:
If the chiplet was to have PCIe lanes and a DDR4 controller, then what is the IO in IO die?

General I/O so no SB is needed, L4 (probably), a NUMA killer for Epyc, and the only way they had to keep adding cores since Zen1 already hit the wall of what was possible, you cant go ahead and make something that is 16CH memory.
Mostly oriented at TR and Servers.

If the no I/O chip on desktop ends up to be true, there is only 3 options:

1) the desktop-notebook CPUs are monolithic.

2) GPU and CPU chiplets attach to each other and they have 1xDDR4 each.

3) There is a desktop/notebook only Navi chiplet with 2xDDR4 and sufficient I/O to match Zen 1 Soc. This is basicly a Northbridge.

#2 is the ONLY OPTION that dosent requiere Desktop/Notebook only designed stuff.

But the presence of a smaller Navi at #3 that we dont know is also a posibility and they could also sell it as a basic dgpu to reemplace ancient things like all those VLIW based dgpus that they are still selling.

Zapetu · Dec 13, 2018

Ok, here's an illustration of the speculative 1xDDR4 and half the IO per die design (based on what moinmoin and dnavas suggested):

As you can see there would likely be the following core configurations:

Ryzen 3 (6C): (3 + 3) + (0 + 0), 1Ch DDR4
Ryzen 5 (8C): (2 + 2) + (2 + 2), 2Ch DDR4
Ryzen 7 (12C): (3 + 3) + (3 + 3), 2Ch DDR4
Ryzen 9 (16C): (4 + 4) + (4 + 4), 2Ch DDR4

Ryzen 3 would be a budget model with just half the IO and 1 memory channel enabled. The other die would be a dummy die. Ryzen 5-9 would have both memory channels enabled but only Ryzen 9 would have full CCXs. There's bound to be some kind of latency penalty accessing other die's memory channel and there is no IO die with centralized (directory based) cache coherency protocols and likely none of that server "magic". There's always one local memory controller for each working die and this ccNUMA design could be optimized for that locality in mind. It should still work good enough in an UMA mode also. Obviously PCIe4 is backwards compatible with PCIe3 and would work in existing motherboards. 16C model could require a new MB with beefier VRM as there has been rumors about that.

You could always make the dies smaller by removing half of the L3 cache and that would make it the same as current 16MB for 8C. It's hard to predict how much cache R5, R7 and R9 models would have but with 32MB/die that's obviously 64MB max and with 16MB/die 32MB max. R3 would only be able to have either 32MB or 16MB of cache.

Now as dnavas stated, the Navi die would have an additional DDR4 channel and that may or may not be enabled for R3G (should be for an iGPU). R5G would use full 8C CPU die (a low clocking one) and full 20CU Navi die. Here's the illustration:

That Navi die may not have any additional PCIe links and it may or may not have any other additional IO. I don't really know how much space that Navi GPU with 20CUs and only one DDR4 memory controller would take but smaller the better would apply here. I'm not making it any smaller in my illustrations because it's already hard enough to add all that text.

There would be the following configurations:

Ryzen 3 G (6C + 15CU): (3 + 3) + (15 CU), (1-)2Ch DDR4
Ryzen 5 (8C + 20CU): (4 + 4) + (20 CU), 2Ch DDR4

None of this really conflicts with AdoredTV leak other than no chiplets+IO die for desktop. My previous attempt would have been better for lower end SKUs but this is more balanced and when you pay more, you get more. If this is actually what AMD has chosen then it would be interesting to see how it would performn.

Here's a wafer map for a 95mm² dies (they could be even smaller especially with half the L3):

I might add cost per die calculation here later but this is obviously cheaper than a 120mm² die.

Now that I studied this design some more, it doesn't look all that bad. It's not perfect but it might be a good compromise for yields, binning and latencies if AMD decided to ditch the IO die for desktop. Sure it has some NUMA properties that some of you will not like but it's still something between a full monolithic die and an IO die design. I could accept this if the performance is there.

It's a lot of work to make these kind of posts so I might take a little pause from here for a while. I stilll have some Rome related speculation that I should draw illustrations for, though. You could add some speculation on what IO should each SKU contain and how small do you think AMD could make that Navi die? How would you feel if the L3 size would be cut in half for the desktop models?

You could also speculate on how different SKUs could have different amounts of cache. And the overall question is that how would you feel if this was the actual design for Ryzen 3k? It should at least have good yields since the monolithic die is basically cut in half and the other half could be replaced with a Navi die. This design actually reminds more of the current Naples or TR design only with much less IO. Hopefully the latencies would be a bit lower, though.

Addition: I almost forgot but here's a clean AM4 package for your own speculations:

darkswordsman17 · Dec 13, 2018

maddie said:
But if adoredTV's source is reliable when he says he only sees one 14nm die at GF then that's out the window.

Not necessarily. I assume that Rome has started production with Ryzen 3xxx to follow by several months. GloFlo might simply not have started fabbing the IO die for Ryzen yet. If the source is on the production area, they might not know what's coming.

What if AMD were to use the same I/O die on everything, and then disable what they don't need? That'd eat up more wafers, and it'd provide a glut of them so that they could either bin them (not sure if that would have much benefit, but even just overproduce for your most profitable market so that you know you won't be limited in the number of chips for that market, you can always use them elsewhere). There might even be some markets where they would utilize the full I/O with a lower number of CPU modules.

With regards to the consoles, assuming they use say a GPU module and a CPU module, and then GDDR6. Would the GPU manage the memory, would that be in the I/O? Would the CPU and GPU connect directly? Or would it need PCIe lanes? Would it need I/O die for other stuff? Or would they have a separate custom I/O die (with GDDR6 memory controllers, and whatever I/O a console would have)?

moinmoin · Dec 13, 2018

Zapetu said:
That's a very bold assumption for a 72mm² chiplet to have 8 cores, 32MB of L3 cache, 1xDDR4 PHY, 12 PCIe, 2xUSB, etc

Just a small note: AM4 is 24 PCIe lanes as I/O period. Everything else, be it USB,SATA, NVMe etc.takes from the 24 PCIe lanes through a "chipset" (more like a breakout box). There is no AM4 with 32 PCIe lanes.

And for clarification, I personally think there is no way Ryzen 3xxx is an MCM but doesn't have an IOC. The chiplets containing sufficient IO like Zeppelin does is a theoretical possibility, but that makes them chips not chiplets which wouldn't match the Zen 2 approach. And I expect AMD to try to again get away with as few masks and as few different dies as possible. Which made the previous AdoredTV's suggestion that consumer IOC could use the same mask as the Rome IOC but split in 4 more charming than the latest contradictions.

I think I'll stop speculating for the holidays and wait for CES.

dnavas · Dec 13, 2018

moinmoin said:
Just a small note: AM4 is 24 PCIe lanes as I/O period. Everything else, be it USB,SATA, NVMe etc.takes from the 24 PCIe lanes through a "chipset" (more like a breakout box). There is no AM4 with 32 PCIe lanes.

Correct, though the Ryzen chip has 32 lanes (cref: TR and Epyc), only 24 of them (for whatever reason) are available via AM4. However, in contemplating distribution of 12 pcie lanes, it made me really want to stick a plx in there somewhere, and that doesn't square with the low-cost nature of R3. It was easier to imagine a die with 16 and just watch 8 of the lanes wasted on the two die solution, given that's the situation we have today, but it's just as possible to save on area and go 12 each I suppose.

Of course, as you say, it's completely possible that this is all nonsense, and there's an I/O die that no one knows about, because it isn't in production yet, or isn't being produced at Glofo, or whatever. We should know in four weeks. :shrug:

beginner99 · Dec 14, 2018

moinmoin said:
Just a small note: AM4 is 24 PCIe lanes as I/O period. Everything else, be it USB,SATA, NVMe etc.takes from the 24 PCIe lanes through a "chipset" (more like a breakout box). There is no AM4 with 32 PCIe lanes.

Which would be more than enough if Ryzen3 has pcie4 as well.

moinmoin said:
The chiplets containing sufficient IO like Zeppelin does is a theoretical possibility, but that makes them chips not chiplets which wouldn't match the Zen 2 approach.

Fully agree.The AMD event started with Zen 2 and chiplet design. it was clear for me that all zen2= chiplets. Only later a product (Rome) based on Zen 2 was introduced. It was clear these are 2 different presentation which are not directly linked. Which means zen2 chiplets will be used in other products than Rome.

I agree with the poster that GF simpyl hasn't started producing the ryzen 3 io die yet. It will probably be 4 times smaller so 4 times faster to produce and in reality with yields even more than that. So they will most likely invest about half of the time on ryzen io compared to rome io die.

Shivansps · Dec 14, 2018

Zapetu said:
Ok, here's an illustration of the speculative 1xDDR4 and half the IO per die design (based on what moinmoin and dnavas suggested):

As you can see there would likely be the following core configurations:

Ryzen 3 (6C): (3 + 3) + (0 + 0), 1Ch DDR4

Ryzen 5 (8C): (2 + 2) + (2 + 2), 2Ch DDR4

Ryzen 7 (12C): (3 + 3) + (3 + 3), 2Ch DDR4

Ryzen 9 (16C): (4 + 4) + (4 + 4), 2Ch DDR4

Ryzen 3 would be a budget model with just half the IO and 1 memory channel enabled. The other die would be a dummy die. Ryzen 5-9 would have both memory channels enabled but only Ryzen 9 would have full CCXs. There's bound to be some kind of latency penalty accessing other die's memory channel and there is no IO die with centralized (directory based) cache coherency protocols and likely none of that server "magic". There's always one local memory controller for each working die and this ccNUMA design could be optimized for that locality in mind. It should still work good enough in an UMA mode also. Obviously PCIe4 is backwards compatible with PCIe3 and would work in existing motherboards. 16C model could require a new MB with beefier VRM as there has been rumors about that.

You could always make the dies smaller by removing half of the L3 cache and that would make it the same as current 16MB for 8C. It's hard to predict how much cache R5, R7 and R9 models would have but with 32MB/die that's obviously 64MB max and with 16MB/die 32MB max. R3 would only be able to have either 32MB or 16MB of cache.

Now as dnavas stated, the Navi die would have an additional DDR4 channel and that may or may not be enabled for R3G (should be for an iGPU). R5G would use full 8C CPU die (a low clocking one) and full 20CU Navi die. Here's the illustration:

That Navi die may not have any additional PCIe links and it may or may not have any other additional IO. I don't really know how much space that Navi GPU with 20CUs and only one DDR4 memory controller would take but smaller the better would apply here. I'm not making it any smaller in my illustrations because it's already hard enough to add all that text.

There would be the following configurations:

Ryzen 3 G (6C + 15CU): (3 + 3) + (15 CU), (1-)2Ch DDR4

Ryzen 5 (8C + 20CU): (4 + 4) + (20 CU), 2Ch DDR4

None of this really conflicts with AdoredTV leak other than no chiplets+IO die for desktop. My previous attempt would have been better for lower end SKUs but this is more balanced and when you pay more, you get more. If this is actually what AMD has chosen then it would be interesting to see how it would performn.

Here's a wafer map for a 95mm² dies (they could be even smaller especially with half the L3):

I might add cost per die calculation here later but this is obviously cheaper than a 120mm² die.

Now that I studied this design some more, it doesn't look all that bad. It's not perfect but it might be a good compromise for yields, binning and latencies if AMD decided to ditch the IO die for desktop. Sure it has some NUMA properties that some of you will not like but it's still something between a full monolithic die and an IO die design. I could accept this if the performance is there.

It's a lot of work to make these kind of posts so I might take a little pause from here for a while. I stilll have some Rome related speculation that I should draw illustrations for, though. You could add some speculation on what IO should each SKU contain and how small do you think AMD could make that Navi die? How would you feel if the L3 size would be cut in half for the desktop models?

You could also speculate on how different SKUs could have different amounts of cache. And the overall question is that how would you feel if this was the actual design for Ryzen 3k? It should at least have good yields since the monolithic die is basically cut in half and the other half could be replaced with a Navi die. This design actually reminds more of the current Naples or TR design only with much less IO. Hopefully the latencies would be a bit lower, though.

Addition: I almost forgot but here's a clean AM4 package for your own speculations:

But the cpu chiplets dont need general I/O, im sure the cpu chiplets only need the IF link, PCI-E x16 and the 1xDDR4, worse case escenario the rest of the I/O is provided by the chipset on AM4 and notebooks. But this may cause compatibility problems on some motherboards with CPU-Only chiplets.

The GPU Chiplet (to me is the full NAVI 12 that will be used in dGPUs) im 100% sure it will have a USB controller since is a requeriment to provide USB-C Video out. It may end up having sata as well for a few reasons if they designed it as this, its not like it is a hard thing to add.

Now, and remember this, im always talking about using the same Navi 12 used in dgpus, if AMD designed a "special" Navi chiplet it could only be used on AM4 and Notebooks we are once again talking about making something that can be only be used there. Same reason of why they would also need a special navi for consoles.

Yotsugi · Dec 14, 2018

TSMC N7 maybe uses Co for contacts (and something else) so yey for high clocks.

Insert_Nickname · Dec 14, 2018

moinmoin said:
Just a small note: AM4 is 24 PCIe lanes as I/O period. Everything else, be it USB,SATA, NVMe etc.takes from the 24 PCIe lanes through a "chipset" (more like a breakout box). There is no AM4 with 32 PCIe lanes.

dnavas said:
Correct, though the Ryzen chip has 32 lanes (cref: TR and Epyc), only 24 of them (for whatever reason) are available via AM4.

Yes, and no. The die has 32 lanes. 24 are immediately available. 16 for GPU, 4 for NVMe and 4 for the southbridge. The "missing" lanes are configured for the SoC's 4 USB3 and 4 SATA ports. Somewhat like Intel's Flex IO.

Shivansps · Dec 14, 2018

The navis may hace a PCI host as well as this supports the radeon pro with m2 low latency slots.

Zapetu · Dec 15, 2018

moinmoin said:
If there's indeed no IOC in Ryzen I'd expect the chiplet to contain enough IO that can be repurposed for AM4 platform interfacing necessity.

Zapetu said:
Would a small 72mm² chiplet be able contain 8 cores, 2xDDR4 memory controllers, 24-32xPCIe4 and alll other IO? Seems like the chiplet is too small to have all that. Let's hope that we get clearer info a little bit later.

At this point I was asking if you really thought that the current Rome chiplet (72mm²) has enough room for all the IO required for a full SoC comparable to current Ryzen models.

moinmoin said:
I'm going with 16c for Ryzen 3xxx so two chiplets which have 1x DDR4 IMC and 12x PCIe4 lanes each, plus the interface for direct communication. Sure a close call.

Then you clarified that you meant that each Rome chiplet might have enough room for half the IO.

Zapetu said:
That's a very bold assumption for a 72mm² chiplet to have 8 cores, 32MB of L3 cache, 1xDDR4 PHY, 12 PCIe, 2xUSB, etc.

I still think that it looks like there is no room for a full or even a "half" SoC in current Rome chiplet (72mm²). Current Zen 4C-CCX with 8MB L3 takes about 48mm² die space on 14nm and if we assume 2x scaling then two of those would take 48mm² on 7nm. That's only 16MB of L3 though, and another 16MB (32MB total) L3 would add, according to my calculations, about 14mm² (that's total of 62mm²). That's a little less than 18mm² that you would get just by directly dividing 36mm² (2x8MB L3 on Zeppelin die) by two.

I have linked to this before but it's a good illustration made by kokhua:

https://twitter.com/x/status/1062233354118955014

So in total 8C worth of CCXs would take about 62mm² and that's not even counting for beefier and wider cores of Zen2. L3 cache should scale a little better than execution units but still let's assume that all of that balances itself out and it still takes about 62mm² for CCXs. There's only about 10mm² left and even a single DDR4 PHY would take about 8mm² on 14nm and the same for 16xPCIe3 (8mm² on 14nm). And we can't assume 2x scaling here. There's still 4xUSB3.1Gen1, security processor, and all the the other uncore stuff and logic including crossbar switches and at least one IFOP 2.0-link to make a full SoC. That all adds up really quickly.

moinmoin said:
Just a small note: AM4 is 24 PCIe lanes as I/O period. Everything else, be it USB,SATA, NVMe etc.takes from the 24 PCIe lanes through a "chipset" (more like a breakout box). There is no AM4 with 32 PCIe lanes.

Each Zeppelin has 4xUSB3.1Gen1 (page 1 of the following link) and those take about 3mm² on 14nm. SATA is muxed with PCIe and uses up to 8 PCIe links (PCIe link bifurcation on page 2):
https://fuse.wikichip.org/news/1064/isscc-2018-amds-zeppelin-multi-chip-routing-and-packaging/

On AM4 though, the configuration would be the following:

Insert_Nickname said:
Yes, and no. The die has 32 lanes. 24 are immediately available. 16 for GPU, 4 for NVMe and 4 for the southbridge. The "missing" lanes are configured for the SoC's 4 USB3 and 4 SATA ports. Somewhat like Intel's Flex IO.

From the die shots it looks like 4xUSB3.1Gen1 might have it's own solder bumbs. I'm not that sure though, but otherwise I agree with what Insert_Nickname wrote.

moinmoin said:
And for clarification, I personally think there is no way Ryzen 3xxx is an MCM but doesn't have an IOC.

Stil what dnavas suggested is possible, if there really would be another a little bit larger "chiplet" ("half" SoC) just for AM4. It's good to have different suggestions, though, to speculate this stuff from all angles.

But as you mentioned yourself, it would be nice if the current Rome chiplet could contain all that stuff to make a full SoC but the IOC approach would make more sense all things considered. That's also my general opinion still. I'm hoping that AdoredTV digs a little bit more information soon.

moinmoin said:
I think I'll stop speculating for the holidays and wait for CES.

That's likely a good decision. CES should reveal at least something important. Happy holidays!

Zapetu · Dec 15, 2018

Bondrewd said:
TSMC N7 maybe uses Co for contacts (and something else) so yey for high clocks.

Since raghu78 should not self-promote his Twitter account, I will do it for him

(even though I don't have any connections to him or to chiakokhua for that matter):

https://twitter.com/x/status/1073356067596251142

Speculation: Ryzen 3000 series

What will Ryzen 3000 for AM4 look like?

16-core monolithic CPU (four 4-core CCXs on SoC die)

12-core monolithic CPU (three 4-core CCXs on SoC die)

8-core monolithic CPU (two 4-core CCXs on SoC die)

8-core monolithic APU (two 4-core CCXs and an iGPU on SoC die)

4-core monolithic APU (one 4-core CCX and an iGPU on SoC die)

16-core modular CPU (two 8-core CPU chiplets + IO chiplet)

8-core modular CPU (one 8-core CPU chiplet + IO chiplet)

8-core modular APU (one 8-core CPU chiplet + IO chiplet + GPU chiplet)

8-core modular APU (one 8-core CPU chiplet + IO chiplet with iGPU)

Something else (specify below)

Diamond Member

Member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Member

Member

Diamond Member

Member

Diamond Member

Diamond Member

Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Member

Member