Discussion AMD Cezanne/Zen 3 APU Speculation and Discussion

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dr1337

Senior member
May 25, 2020
329
547
106
Fresh leak out today, not much is known but at least 8cu's is confirmed. Probably an engineering sample, core count is unknown and clocks may not be final.

This is very interesting to me because cezanne is seemingly 8cu only, and it seems unlikely to me that AMD could squeeze any more performance out of vega. A cpu only upgrade of renoir may be lackluster compared to tigerlake's quite large GPU.

What do you guys think? Will zen 3 be a large enough improvement in APU form? Will it have full cache? Are there more than 8cus? Has AMD truly evolved vega yet again or is it more like rdna?
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Vega iGPU has been showing all the signs of bandwidth starvation in the higher SKUs since it was introduced in Raven Ridge. Performance scaling with extra clock speed has been poor, especially in the higher CU SKUs, as they can't stay fed. It's one of the main reasons that performance didn't take a hit when they reduced the CU count in Renoir. The extra clock speed does help in a few spots where there are non-ram-bandwidth limited tasks that need to be performed.

I suspect that, what little improvement we are seeing is more related to the larger L3 cache reducing RAM bandwidth contention somewhat, and possibly a bit of tweaking with the dram controller to reduce latency somewhere, though, that is purely speculation on my part.

Also, add me to the list of people that think that there's some sort of gross oversimplification or factual error with the iGPU being on the same voltage plane with the CPU in Renoir. I think that they may be referring to some sort of power management strategy that coupled the total power usage of the CPU cores and the iGPU under one total limit, and now, they have isolated limits, that total more than the previous total limit, that is governed more by system thermal and power delivery capacity, allowing better designed systems more ability to excell in performance. With Zen3 managing to get more from each clock, it can sustain needed performance at even lower frequencies, better allowing the iGPU to run faster and hotter.

I guess that the big question is still, is it enough to catch up to the Intel Xe iGPU in Tiger Lake? The 80EU G7 implementation in the i5 SKUs is able to hang with the 4700u in most situations, and the higher clocked 96eu G7 in the i7 SKUs can often beat the 4800u by noticeable amounts. A 10% improvement across the board with better cores and larger L3 should make things much more even on average.
 

misuspita

Senior member
Jul 15, 2006
398
432
136
I don't think the GPU performance can be much better than what they got right now until more bandwidth will be available. Neither Intel nor AMD. So, best they all can do is trade punches in the approx. same perf plane. When DDR5 will come, and with the aid of some kind of integrated non expensive memory , the best of the SKUs may go into mid discrete GPU performance of the day.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
Unless they wanted use a new architecture famous for using a large cache to increase effective bandwidth. I think it was called RDNA2...

Obviously time tables, die size and all that but I would hope it performs much better than Vega with the same power and bandwidth.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
From the roadmaps, it looks like we won't get RDNA2 APUs in laptops until either later this year or early next year. I also wouldn't expect them to take the die area hit for any sort of Infinity Cache for at least another node.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
I don't think the GPU performance can be much better than what they got right now until more bandwidth will be available.
While true that more bandwidth very likely leads to a significant performance jump, it should be clear now that AMD designed the iGPU to stay competitive with Intel's offerings (= good enough). RDNA2 would have already offered a bigger jump with the same bandwidth but AMD kept that for later to offer a steady cadence of nice performance improvements with its APUs.

Only problem now is that doing so allowed Apple to jump ahead with iGPU performance in M1. Will be interesting how AMD adapts to that, in addition to having converted OEM/ODMs earlier than it itself apparently planned for.

From the roadmaps, it looks like we won't get RDNA2 APUs in laptops until either later this year or early next year. I also wouldn't expect them to take the die area hit for any sort of Infinity Cache for at least another node.
I expect early next year for Rembrandt. Also I expect the Infinity Cache for the iGPU to be shared with the L3$ of the CPU, with the size being dynamical depending on the RAM size allocated to the iGPU. That shared last level cache should be at least 32MB then.
 

Mopetar

Diamond Member
Jan 31, 2011
7,826
5,969
136
Remember that is not only about performance, but availity as well.

Or just having signed a contract with Intel for some kind of exclusivity. Apple stayed on Intel across their product line even when it was clear that an AMD CPU would have been significantly better, particularly in the cases of their computers targeting professionals who would need ECC, and therefor a Xeon. Dell may have had a similar deal with Intel, which may have looked like a good one when they signed it, but could be holding them back now that they won't have top of the line products to compete with other manufacturers for a while.

Vega iGPU has been showing all the signs of bandwidth starvation in the higher SKUs since it was introduced in Raven Ridge. Performance scaling with extra clock speed has been poor, especially in the higher CU SKUs, as they can't stay fed. It's one of the main reasons that performance didn't take a hit when they reduced the CU count in Renoir. The extra clock speed does help in a few spots where there are non-ram-bandwidth limited tasks that need to be performed.

I suspect that, what little improvement we are seeing is more related to the larger L3 cache reducing RAM bandwidth contention somewhat, and possibly a bit of tweaking with the dram controller to reduce latency somewhere, though, that is purely speculation on my part.

It's a little unfortunate that AMD hasn't been as aggressive with including their newest GPU technology in their APUs, but they seemed to have made some pretty significant overhauls to Vega and gotten a lot more out of that architecture than I had thought possible. The problems that infinity cache was designed to solve seem like they'd apply in the case of their APUs and if your assessment about the additional L3 cache being the biggest driver in performance is correct, then an APU with an even larger amount of infinity cache should really help to alleviate that bottleneck.

However, the other explanation for reduced core count is that the clock speeds on the cores have increased 40%+ since Raven Ridge. All else equal you don't need as many cores if you can run a smaller number of them faster. If you do that, the 704 shaders at 1300 MHz in Raven Ridge have basically the same theoretical performance as the 512 shaders at 1750 MHz in Renoir assuming no architectural improvements of any other kind. Of course you wouldn't stray far from that target if there's some other bottleneck in place. Cezanne has even higher clock speeds, but I wonder how neatly those line up with additional memory bandwidth due to supporting faster RAM. Napkin math puts the top-end Cezanne at 10% above Raven Ridge assuming no changes beyond clock speeds.

I think the more interesting parts are at the bottom end, where Cezanne and Lucienne have twice as many shaders as the low-end of Raven Ridge and the clock speed has increased by 25% to 60% depending on what you're comparing. Even without architectural improvements that puts the low-end performance that these parts will offer really close to top-end performance that Raven Ridge parts offered. That's a pretty good jump considering Raven Ridge is only around 3 years old.
 
  • Like
Reactions: Vattila

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Comparing the 8CU having 3500u to the 6CU having 5300/5400u, both with four cores with hyper threading, one with the same L3, it's not a night and day difference. Yes, the CPU cores themselves are markedly better, but the iGPU effective throughput is very similar, with a slight node to the 3500u in some areas. Go out on YouTube and look at some gaming benchmark runs for both the 3500u and the lowest end Renoir products out there, which will be very similar to Lucienne, and you'll see how close they are.

If you are referencing the dual core/3CU Raven Ridge 2 based products, you're really in a whole different class of chip.

As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.

I personally believe that they would do better with a move to a 2 layer POP package with an HBM die and a CCD mounted on the common I/O substrate die with an external DDR5 connection. The HBM could be active for the iGPU when selected or it senses wall power, and it could be inactive for mobile use. Lower SKUs could not include the HBM. The lower die could have a generous L3 or L4 cache if they wanted. They could also use an N5 CCD on an N7 I/O die that has most of its area taken up with an infinity Cache as well and just concentrate on having to rely on a dual (quad sub channel) channel DDR5 implementation. It would give them nearly the ram throughput of the rx560, but with the infinity Cache, it would perform more like a 570/5500m. But, that's just wishful thinking. It would still be far too expensive to do that in the near term.
 

dr1337

Senior member
May 25, 2020
329
547
106
As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.
I mean cezanne is only 15% bigger than renoir despite it having double the l3 cache and slightly bigger cores. I don't think tacking on another 16mb for the igpu will be that big of a deal especially should they move to a smaller node. Especially if it can provide a substantial increase to performance/watt. Granted its not just as simple as "tacking on cache", but with how well LLC has worked for rdna2, it just seems to be expected to me for a next generation APU
 
  • Like
Reactions: Tlh97

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
I'd be very surprised if they do any HBM + RDNA2 APU. Such a configuration seems to be in conflict with the RDNA2 design goals. If they were going to do such a thing, I would have expected it for the consoles (which may not be using infinity cache? do we know?) That neither did suggest it is too costly.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,607
5,821
146
As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.

Giant L3? Even 16MB would drastically improve mem bw for an iGPU. Compared to Navi22 (most likely 384GB/s) for example, your standard iGPU with DDR4-3200 dual channel is what, like 50GB/s?

Combine that with RDNA's improvements to colour compression and some LPDDR5-5500 you have the makings for a very competent iGPU without the need of drastically shooting up cost with HBM2.

You're effectively looking at ~85GB/s memory bandwidth (~2/9 Navi22) with either 1/6 or 1/4 the Infinity Cache depending on what N22 is (64/96MB). Lets just say N22 ends up at 3060Ti perf, you're effectively looking at 1/4 that (especially with 12CUs on Rembrandt). That's GTX1050Ti tier performance. Roughly 60-70% over MX450 (based on this review that shows MX450 as 75% of a 1050)

 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
We've been seeing the MT performance issues in the various leaks for weeks now. I don't know if they are more related to thermals, memory bandwidth starvation, the new 8 core CCX with a smaller L3 than desktop, or something else. I'm more inclined to agree with the above post that it's thermals.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,763
3,585
136
We've been seeing the MT performance issues in the various leaks for weeks now. I don't know if they are more related to thermals, memory bandwidth starvation, the new 8 core CCX with a smaller L3 than desktop, or something else. I'm more inclined to agree with the above post that it's thermals.
It's not thermals, it's the power limit. 35W is simply not enough for 8 cores.
 

coercitiv

Diamond Member
Jan 24, 2014
6,176
11,808
136
Also, I would not be quick to attribute improved battery life to the Zen 3 silicon alone. Laptop CPUs can get a nice bump in battery life from multiple sources, not all of them being directly linked to core performance. LCN-U may still surprise us, as far as a Zen 2 based SKU can anyway.
As already suspected from the AMD slides, Lucienne is not simply a Renoir rebadge, as it does come with power management improvements.

So while yes it is the same silicon layout and floorplan, some of these features weren’t possible in Renoir. AMD built in these features perhaps knowing that they couldn’t be enabled in Renoir, but sufficient changes and improvements at the manufacturing stage and firmware stage were made such that these features were enabled in Lucienne. More often than not these ideas often have very strict time windows to implement, and even if they are designed in the hardware, there is a strict cut-off point by which time if it doesn’t work as intended, it doesn’t get enabled. Obviously the best result is to have everything work on time, but building CPUs is harder than we realize.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
It's not thermals, it's the power limit. 35W is simply not enough for 8 cores.
And even then, it's only a hair off the stock 5600X in rendering any many other MT workloads.
And importantly, in ST workloads, it's highly competitive with the top of the line 1185G7.

I'm in the market for a new laptop, but my inclination is to wait for evaluation of one of the 5600 U/H/HS chips. Most of my laptop work is going to be lightly threaded, though, and the single-core boost gimp might not be great. Lopping 15% of performance right off the top... :(
 

tamz_msc

Diamond Member
Jan 5, 2017
3,763
3,585
136
And even then, it's only a hair off the stock 5600X in rendering any many other MT workloads.
And importantly, in ST workloads, it's highly competitive with the top of the line 1185G7.

I'm in the market for a new laptop, but my inclination is to wait for evaluation of one of the 5600 U/H/HS chips. Most of my laptop work is going to be lightly threaded, though, and the single-core boost gimp might not be great. Lopping 15% of performance right off the top... :(
Well it depends on the workload. A core in the 5600X would after all consume 2.5x the power of a core in a 5980HS for a difference of 1.2GHz in clock speed, but in the end six faster clocked cores will win out over eight slower clocked cores especially when the difference in clock speed is so huge.
 

uzzi38

Platinum Member
Oct 16, 2019
2,607
5,821
146
Nevermind, we'll soon see mobile RDNA2 in action on Van Gogh.

Hopefully RDNA2 doesn't persist and we will see a more regular GPU uArch cadence for APUs in the future.
If we ever see Van Gogh. Afaik MS aren't using it any more for whatever reason.

We may be waiting for Rembrandt for the full platform upgrade (except CPU) all in one go at this rate sadly.
 

Hitman928

Diamond Member
Apr 15, 2012
5,232
7,773
136
The 15W results look to me more impressive than the 35W results. The 5980HS at 15W in many instances is able to match or beat the 4900H at 35W. There are some anomalous results for the 5980HS where it shows very low performance, but I'm guessing that's down to the laptop configuration. It will be interesting to see what the 'u' variants can do on laptops tuned for 15W performance.