Speculation: Ryzen 4000 series/Zen 3

Page 129 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jamescox

Senior member
Nov 11, 2009
637
1,103
136
The whole N5P rumor was about AMD jumping in to fill N5 capacity at TSMC freed by Huawei/HiSilicon no longer being allowed to order more there. Nobody denied that AMD is working with 5nm. Much of the discussion was about the timing and how realistic was AMD doing what the rumor claims this autumn already, with some even suggesting Zen 3 would be duplicated on that node.

And unlike mere refreshes works with new nodes are a huge financial undertaking so it does matter very much whether investors are being informed about it happening.

If the next generation is going to be at 5 nm, then they would almost certainly already be working on it, which could have lead to the rumors. I think several past rumors have actually been confusion between different generations that were being worked on at the same time, at different stages of development.

It is unclear what the next generation will actually be called. I am expecting Zen 3 as Ryzen 4000 desktop parts to use roughly the same IO die as current Ryzen 3000 parts. For Ryzen 5000 ( Zen 3+ or Zen 4?), I expect almost the same cpu chips or possibly a die shrink combined with completely new IO. Zen 3 is a new architecture, they may tweak it, but I wouldn’t expect much changes right away.

With the release of the XT Zen 2 base parts, I am wondering if the initial Zen 3 parts will actually be large cache (>32 MB) HPC parts on 7 nm (server first). The desktop parts may come later and I would expect that they will still be 7 nm also.

AMD has been making the APU based on the previous generation; Ryzen 4000 APUs are Zen 2 based. I could see them wanting to change that, so perhaps they are working on getting a Zen 3 / RDNA2 APU out on 5 nm as early as possible. They would want both for better power efficiency.
 
  • Like
Reactions: Tlh97

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I don’t actually like calling AMD’s multi-chip designs “chiplets”. They are really just MCMs; multi-chip modules. MCMs are not new; I assume that there have been many types of MCMs before this. IBM Power 5 with, if I am remembering correctly, 4 cpu chips and 4 cache chips were around a long time ago. I kind of think that “chiplet“ should specifically refer to die made to be mounted on a silicon interposer not just chips in an MCM.
I think you are mixing package technology and chip design approach. Having cache or memory as separate chips has been done for decades. HBM does that. With 386 cache was still external before it was moved into the chip with 486. And so on.

AMD so far only used MCM for packaging its chips, first only with Epyc/Threadripper for Zen/Zen+, now with both Epyc/Threadripper and desktop Ryzen for Zen 2. They talked about using other packaging technologies, interposer was considered for Zen 2 but MCM was more flexible (see slide 8 of 27).

In AMD's case “chiplets” specifically refers to the Zen 2 chips where the dies are no longer self sufficient (like Zeppelin was) but require a counterpart (CCD + IOD) to form a complete chip.

If the next generation is going to be at 5 nm, then they would almost certainly already be working on it, which could have lead to the rumors.
Indeed, which is why part of the discussion here was about how much of a lead time would be necessary for the rumor to be true.
 

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
Gigabyte B550 Aorus Master specs seems to reveal more info about Renoir.

1 x PCI Express x16 slot (PCIEX16), integrated in the CPU:
  1. 3rd Generation AMD Ryzen™ processors support PCIe 4.0 x16 mode
  2. New Generation AMD Ryzen™ with Radeon™ Graphics processors support PCIe 3.0 x16 mode
    * For optimum performance, if only one PCI Express graphics card is to be installed, be sure to install it in the PCIEX16 slot.
    * The PCIEX16 slot shares bandwidth with the M2B_CPU and M2C_CPU connectors. The PCIEX16 slot operates at up to x8 mode when a device is installed in the M2B_CPU or M2C_CPU connector.

Integrated in the New Generation AMD Ryzen™ with Radeon™ Graphics processors:
  1. 1 x HDMI port, supporting a maximum resolution of 4096x2160@60 Hz
    * Support for HDMI 2.1 version, HDCP 2.3, and HDR.
Maximum shared memory of 16 GB

Renoir main slot is wiring is x16 3.0, they doubled that from x8 on Picasso/Raven, but no PCI-E 4.0 support, due to be intended for notebooks this is expected i belive. Not sure how they did this, i trought it was a pin limitation.
Renoir now supports up to 16GB shared memory, up from 2GB on Raven/Picasso.
 

beginner99

Diamond Member
Jun 2, 2009
5,208
1,580
136
Renoir 4500U without any cooling.
The performance is remarkable having zero cooling, except ambient air.

holly smokes. For does not wanting to click the link, no cooling means no heatsink at all. Just the headspreader. And it runs just fine albeit I can't judge how fast from that video.
Additionally they point a heat sensor at it and you can actually see from the heat how the load gets moved around between the cores. pretty cool.
 

french toast

Senior member
Feb 22, 2017
988
825
136
What if AMD is going to do a two step approach to APUs, as Renoir is a solid well balanced APU that at ~ 150mm² is cost affective on N7, they refresh that as speculated on here on N7P- Q2 2021 with slightly higher clocks, perhaps lpddr5.. Whilst Simultaneously introducing Cezanne on N6/N5?? - Q2 2021.. A much higher performant and expensive chip with ZEN 3, RDNA 1.5..etc which is a direct competitor for Tigerlake/Alderlake in premium laptops?

Renoir refresh for budget... Cezanne for premium.
Renoir is too good all round to replace, doing so would limit Cezanne's possibilities imo.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
And it runs just fine albeit I can't judge how fast from that video.

The base performance is 50% faster in ST and about 3x the performance in MT for Cinebench. ST is comparable to a -Y series chip and MT about Pentium Silver N5000. I can not see exactly but I think the CPU is averaging at 1.3-1.4GHz.

GPU performance in Time Spy is about UHD 620 graphics(Gen 9, U-series) lol.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,629
136
holly smokes. For does not wanting to click the link, no cooling means no heatsink at all. Just the headspreader. And it runs just fine albeit I can't judge how fast from that video.
Additionally they point a heat sensor at it and you can actually see from the heat how the load gets moved around between the cores. pretty cool.

Mobile Renoir doesn't have a heatspreader AFAIK so it's just bare die. With that said, he is also pointing a desktop case fan right at it so there is significant air flow over the die. Still impressive but not exactly accurate to say no cooling is being applied.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Renoir main slot is wiring is x16 3.0, they doubled that from x8 on Picasso/Raven, but no PCI-E 4.0 support, due to be intended for notebooks this is expected i belive. Not sure how they did this, i trought it was a pin limitation.
Renoir now supports up to 16GB shared memory, up from 2GB on Raven/Picasso.
Not a pin limitation, a die PCIe controller limitation. Picasso/RR had x16 + x4 PCIe, of which the former was divided into x8 for the iGPU and x8 for PEG/GP PCIe. Desktop APUs don't populate the pins for the last 8 PEG lanes; desktop CPUs don't populate the iGPU graphics output pins. Socket pins have fixed functionality. Renoir simply adds more internal PCIe (it also adds a second m.2 slot), allowing it to fully utilize the PEG pins on the AM4 package.
Zen + did not involve any physical design changes. To quote from the Ryzen 7 2700x review

"Ultimately, the new processors are almost carbon copies of the old ones, both in terms of design and microarchitecture. AMD is calling the design of the cores as ‘Zen+’ to differentiate them to the previous generation ‘Zen’ design, and it mostly comes down to how the microarchitecture features are laid out on the silicon. When discussing with AMD, the best way to explain it is that some of the design of the key features has not moved – they just take up less area, leaving more dark silicon between other features. "
I dont know what you read from that quote, but they are very clearly saying that this is new silicon with obvious - if small - physical design changes. Yes, the layout is the same, and yes, the architecture is very similar with just minimal tweaks, but various parts of the die were shrunk down, with dark silicon left in the space freed up by this shrinkage. In the previous design, these parts were larger, and the silicon now left dark was thus occupied by the larger components. This obviously means that despite the small overall differences, this is indeed a description of a part with physical design changes. No, the changes were not on the level of architecture or layout, but those are bird's-eye-view features, and it is obviously possible to make physical changes to smaller scale features despite this - which is what they did.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Mobile Renoir doesn't have a heatspreader AFAIK so it's just bare die. With that said, he is also pointing a desktop case fan right at it so there is significant air flow over the die.

Are you talking about the black fan on the left of the video?

While it will accelerate airflow and its an open case, its still an impressive demonstration.

It must be running at something like 5W. What's the most impressive part is the performance scales quite well. Certain processors drop in performance dramatically at such low TDP envelopes. The IR camera isn't showing it running very warm either.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Are you talking about the black fan on the left of the video?

While it will accelerate airflow and its an open case, its still an impressive demonstration.

It must be running at something like 5W. What's the most impressive part is the performance scales quite well. Certain processors drop in performance dramatically at such low TDP envelopes. The IR camera isn't showing it running very warm either.
Even with some added airflow, being able to dissipate 5W from the silicon die alone is downright stunning, even if it's bouncing off TjMax and throttling constantly. That's the kind of thermal envelope that makes phones overheat, after all, and they are quite a lot bigger than that die. Of course they also rely on skin temperatures that need to stay much lower than 100°C, but still... Wow.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,629
136
Are you talking about the black fan on the left of the video?

While it will accelerate airflow and its an open case, its still an impressive demonstration.

It must be running at something like 5W. What's the most impressive part is the performance scales quite well. Certain processors drop in performance dramatically at such low TDP envelopes. The IR camera isn't showing it running very warm either.

Yes, the black fan is a case fan clearly positioned to provide air flow over the die. Like I said, it's definitely still impressive, just wanted to point out that there is some level of cooling being applied which is not what was said in the original post.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Of course they also rely on skin temperatures that need to stay much lower than 100°C, but still... Wow.

It probably helps a ton that its open air with fresh forced air going over it, and having a die that's twice the size exposed.

I'd like to see exact power consumption, temperature, and frequency graphs. But its still very impressive.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
The base performance is 50% faster in ST and about 3x the performance in MT for Cinebench. ST is comparable to a -Y series chip and MT about Pentium Silver N5000. I can not see exactly but I think the CPU is averaging at 1.3-1.4GHz.

GPU performance in Time Spy is about UHD 620 graphics(Gen 9, U-series) lol.

What's interesting is that you can see which cores are being loaded by watching the IR camera and observing the hot areas, and this shows some possible inefficiencies.

Specifically, only one CCX is being used in the Cinebench ST test, so the load is bouncing between those four cores. But in such a thermally limited scenario it might be better to split the load between both CCXes and take the hit to data locality. Zen 3 APUs, with their 8-core CCX, will naturally have twice the cores to share the thermal load.

In Crysis the CCX nearest the GPU appears to be most active. I would think this should be reversed to split the thermal load more evenly across the die. It's especially important without a heatsink, but I imagine this should be the case generally as well. Of course, this again isn't an issue come Zen 3.

Lastly... I wonder if there's some floor limit to how often a task moves to a different CPU within the same CCX that is not ideal in this scenario? Eg. switching has some performance cost, so in normal usage you wouldn't want to switch too excessively. But in this extremely thermally limited context, extremely rapid switching to spread the thermal load sounds like a good idea. This is just speculation though, and it's possible no such floor limit exists.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Switching is probably done by the OS anyways.

It's a very niche scenario that I suspect neither AMD nor Microsoft will seriously work on.
 

Veradun

Senior member
Jul 29, 2016
564
780
136
That is what I meant. An APU (no separate chipset) plus optional gpu chip and optional HBM chip.

I don’t actually like calling AMD’s multi-chip designs “chiplets”. They are really just MCMs; multi-chip modules. MCMs are not new; I assume that there have been many types of MCMs before this. IBM Power 5 with, if I am remembering correctly, 4 cpu chips and 4 cache chips were around a long time ago. I kind of think that “chiplet“ should specifically refer to die made to be mounted on a silicon interposer not just chips in an MCM.
Starchips and Motherchip

or

Stones and Gauntlet (**points to infinity fabric**)

:>
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
What's interesting is that you can see which cores are being loaded by watching the IR camera and observing the hot areas, and this shows some possible inefficiencies.

Specifically, only one CCX is being used in the Cinebench ST test, so the load is bouncing between those four cores. But in such a thermally limited scenario it might be better to split the load between both CCXes and take the hit to data locality. Zen 3 APUs, with their 8-core CCX, will naturally have twice the cores to share the thermal load.

In Crysis the CCX nearest the GPU appears to be most active. I would think this should be reversed to split the thermal load more evenly across the die. It's especially important without a heatsink, but I imagine this should be the case generally as well. Of course, this again isn't an issue come Zen 3.

Lastly... I wonder if there's some floor limit to how often a task moves to a different CPU within the same CCX that is not ideal in this scenario? Eg. switching has some performance cost, so in normal usage you wouldn't want to switch too excessively. But in this extremely thermally limited context, extremely rapid switching to spread the thermal load sounds like a good idea. This is just speculation though, and it's possible no such floor limit exists.
Doesn't the 4300U have cores enabled on just the one CCX anyhow? This would likely look very different if it was a 4700U or 4800U. Or even a 4500U or 4600U.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
It probably helps a ton that its open air with fresh forced air going over it, and having a die that's twice the size exposed.

I'd like to see exact power consumption, temperature, and frequency graphs. But its still very impressive.
Yeah, that would be very interesting to see. Though remember that AT does their phone power testing with a fan blowing onto the back of the phone, and while thermal transfer between a phone SoC and its back is anything but perfect, there is nonetheless a dramatic increase in surface area there.
 
  • Like
Reactions: spursindonesia

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,574
146
Workloads jumping between cores in a CCX has been a thing since Zen 2 launch. It's done to take maximum advantage of the boost algorithm Ryzen CPUs have where by jumping between cores you can work on one that's cooler, which the chip than boosts to a higher degree than the first core.

Usually workloads cycle between the two fastes cores in a single CCX to minimise latency and maximise 1T output. Sort of gaming the boost algorithm by always trying to operate on a cooler and so - even if for a microburst - higher clocking core.

EDIT: I'd imagine this same behaviour will be noted on other CPUs and most notably chips that boost similarly to Ryzen or use TVB from Intel.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Not a pin limitation, a die PCIe controller limitation. Picasso/RR had x16 + x4 PCIe, of which the former was divided into x8 for the iGPU and x8 for PEG/GP PCIe. Desktop APUs don't populate the pins for the last 8 PEG lanes; desktop CPUs don't populate the iGPU graphics output pins. Socket pins have fixed functionality. Renoir simply adds more internal PCIe (it also adds a second m.2 slot), allowing it to fully utilize the PEG pins on the AM4 package.

I dont know what you read from that quote, but they are very clearly saying that this is new silicon with obvious - if small - physical design changes. Yes, the layout is the same, and yes, the architecture is very similar with just minimal tweaks, but various parts of the die were shrunk down, with dark silicon left in the space freed up by this shrinkage. In the previous design, these parts were larger, and the silicon now left dark was thus occupied by the larger components. This obviously means that despite the small overall differences, this is indeed a description of a part with physical design changes. No, the changes were not on the level of architecture or layout, but those are bird's-eye-view features, and it is obviously possible to make physical changes to smaller scale features despite this - which is what they did.

Zen+ dies have more in common with Zen 1 Threadripper dies than Zen+. The latency for both chips is nearly identical.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
Doesn't the 4300U have cores enabled on just the one CCX anyhow? This would likely look very different if it was a 4700U or 4800U. Or even a 4500U or 4600U.

I got confused since it was stated earlier as a 4500u. Although even that wouldn't make sense since that chip has 3 cores enabled per ccx and you can clearly see all the cores in the CCX firing.
 
  • Like
Reactions: Tlh97 and Valantar

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
@Valantar on your earlier post about Renoir PCIe resources...

Has the actual package pin-out for Mobile Renoir changed? I don't recall reading where it has new pins, and seeing that the 4000 series APUs are destined for desktop work shortly on AM4, it wouldn't seem like there has been any reconfiguration of the PCIe output (active pins) of the processor itself. At best, it looks, at least to me, like AMD has made some uefi and controller changes to the platform to allow the package pins that would normally go to the PCH/South-Bridge to instead drive a second m.2 slot on mobile configurations, leaving the rest of the PCIe output broadly the same, save for enabling unused lanes from Raven Ridge. Since the desktop AM4 socket isn't changing, it would seem that, at the most, they have made the needed PCIe controller changes internally to allow the output of a full x16 on the channel that drives the first (and/or second) full length PCIe slot on most motherboards. Of course, that does leave the question: does it support any native bifurcation on the socket driven x16?
 

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
@Valantar on your earlier post about Renoir PCIe resources...

Has the actual package pin-out for Mobile Renoir changed? I don't recall reading where it has new pins, and seeing that the 4000 series APUs are destined for desktop work shortly on AM4, it wouldn't seem like there has been any reconfiguration of the PCIe output (active pins) of the processor itself. At best, it looks, at least to me, like AMD has made some uefi and controller changes to the platform to allow the package pins that would normally go to the PCH/South-Bridge to instead drive a second m.2 slot on mobile configurations, leaving the rest of the PCIe output broadly the same, save for enabling unused lanes from Raven Ridge. Since the desktop AM4 socket isn't changing, it would seem that, at the most, they have made the needed PCIe controller changes internally to allow the output of a full x16 on the channel that drives the first (and/or second) full length PCIe slot on most motherboards. Of course, that does leave the question: does it support any native bifurcation on the socket driven x16?


It does, B550 aorus master confirms that.

1 x M.2 connector (M2A_CPU), integrated in the CPU, supporting Socket 3, M key, type 2242/2280/22110 SSDs:
  1. 3rd Generation AMD Ryzen™ processors support SATA and PCIe 4.0 x4/x2 SSDs
  2. New Generation AMD Ryzen™ with Radeon™ Graphics processors support SATA and PCIe 3.0 x4/x2 SSDs
2 x M.2 connectors (M2B_CPU/M2C_CPU), integrated in the CPU, supporting Socket 3, M key, type 2242/2280/22110 SSDs:
  1. 3rd Generation AMD Ryzen™ processors support PCIe 4.0 x4/x2 SSDs
  2. New Generation AMD Ryzen™ with Radeon™ Graphics processors support PCIe 3.0 x4/x2 SSDs
 
Last edited:
  • Like
Reactions: lightmanek