Discussion AMD Cezanne/Zen 3 APU Speculation and Discussion

dr1337 · Aug 10, 2020

https://twitter.com/x/status/1292844896374718470

Fresh leak out today, not much is known but at least 8cu's is confirmed. Probably an engineering sample, core count is unknown and clocks may not be final.

This is very interesting to me because cezanne is seemingly 8cu only, and it seems unlikely to me that AMD could squeeze any more performance out of vega. A cpu only upgrade of renoir may be lackluster compared to tigerlake's quite large GPU.

What do you guys think? Will zen 3 be a large enough improvement in APU form? Will it have full cache? Are there more than 8cus? Has AMD truly evolved vega yet again or is it more like rdna?

LightningZ71 · Jan 25, 2021

Vega iGPU has been showing all the signs of bandwidth starvation in the higher SKUs since it was introduced in Raven Ridge. Performance scaling with extra clock speed has been poor, especially in the higher CU SKUs, as they can't stay fed. It's one of the main reasons that performance didn't take a hit when they reduced the CU count in Renoir. The extra clock speed does help in a few spots where there are non-ram-bandwidth limited tasks that need to be performed.

I suspect that, what little improvement we are seeing is more related to the larger L3 cache reducing RAM bandwidth contention somewhat, and possibly a bit of tweaking with the dram controller to reduce latency somewhere, though, that is purely speculation on my part.

Also, add me to the list of people that think that there's some sort of gross oversimplification or factual error with the iGPU being on the same voltage plane with the CPU in Renoir. I think that they may be referring to some sort of power management strategy that coupled the total power usage of the CPU cores and the iGPU under one total limit, and now, they have isolated limits, that total more than the previous total limit, that is governed more by system thermal and power delivery capacity, allowing better designed systems more ability to excell in performance. With Zen3 managing to get more from each clock, it can sustain needed performance at even lower frequencies, better allowing the iGPU to run faster and hotter.

I guess that the big question is still, is it enough to catch up to the Intel Xe iGPU in Tiger Lake? The 80EU G7 implementation in the i5 SKUs is able to hang with the 4700u in most situations, and the higher clocked 96eu G7 in the i7 SKUs can often beat the 4800u by noticeable amounts. A 10% improvement across the board with better cores and larger L3 should make things much more even on average.

misuspita · Jan 25, 2021

I don't think the GPU performance can be much better than what they got right now until more bandwidth will be available. Neither Intel nor AMD. So, best they all can do is trade punches in the approx. same perf plane. When DDR5 will come, and with the aid of some kind of integrated non expensive memory , the best of the SKUs may go into mid discrete GPU performance of the day.

gdansk · Jan 25, 2021

Unless they wanted use a new architecture famous for using a large cache to increase effective bandwidth. I think it was called RDNA2...

Obviously time tables, die size and all that but I would hope it performs much better than Vega with the same power and bandwidth.

LightningZ71 · Jan 25, 2021

From the roadmaps, it looks like we won't get RDNA2 APUs in laptops until either later this year or early next year. I also wouldn't expect them to take the die area hit for any sort of Infinity Cache for at least another node.

moinmoin · Jan 25, 2021

misuspita said:
I don't think the GPU performance can be much better than what they got right now until more bandwidth will be available.

While true that more bandwidth very likely leads to a significant performance jump, it should be clear now that AMD designed the iGPU to stay competitive with Intel's offerings (= good enough). RDNA2 would have already offered a bigger jump with the same bandwidth but AMD kept that for later to offer a steady cadence of nice performance improvements with its APUs.

Only problem now is that doing so allowed Apple to jump ahead with iGPU performance in M1. Will be interesting how AMD adapts to that, in addition to having converted OEM/ODMs earlier than it itself apparently planned for.

LightningZ71 said:
From the roadmaps, it looks like we won't get RDNA2 APUs in laptops until either later this year or early next year. I also wouldn't expect them to take the die area hit for any sort of Infinity Cache for at least another node.

I expect early next year for Rembrandt. Also I expect the Infinity Cache for the iGPU to be shared with the L3$ of the CPU, with the size being dynamical depending on the RAM size allocated to the iGPU. That shared last level cache should be at least 32MB then.

Mopetar · Jan 25, 2021

Shivansps said:
Remember that is not only about performance, but availity as well.

Or just having signed a contract with Intel for some kind of exclusivity. Apple stayed on Intel across their product line even when it was clear that an AMD CPU would have been significantly better, particularly in the cases of their computers targeting professionals who would need ECC, and therefor a Xeon. Dell may have had a similar deal with Intel, which may have looked like a good one when they signed it, but could be holding them back now that they won't have top of the line products to compete with other manufacturers for a while.

LightningZ71 said:
Vega iGPU has been showing all the signs of bandwidth starvation in the higher SKUs since it was introduced in Raven Ridge. Performance scaling with extra clock speed has been poor, especially in the higher CU SKUs, as they can't stay fed. It's one of the main reasons that performance didn't take a hit when they reduced the CU count in Renoir. The extra clock speed does help in a few spots where there are non-ram-bandwidth limited tasks that need to be performed.

I suspect that, what little improvement we are seeing is more related to the larger L3 cache reducing RAM bandwidth contention somewhat, and possibly a bit of tweaking with the dram controller to reduce latency somewhere, though, that is purely speculation on my part.

It's a little unfortunate that AMD hasn't been as aggressive with including their newest GPU technology in their APUs, but they seemed to have made some pretty significant overhauls to Vega and gotten a lot more out of that architecture than I had thought possible. The problems that infinity cache was designed to solve seem like they'd apply in the case of their APUs and if your assessment about the additional L3 cache being the biggest driver in performance is correct, then an APU with an even larger amount of infinity cache should really help to alleviate that bottleneck.

However, the other explanation for reduced core count is that the clock speeds on the cores have increased 40%+ since Raven Ridge. All else equal you don't need as many cores if you can run a smaller number of them faster. If you do that, the 704 shaders at 1300 MHz in Raven Ridge have basically the same theoretical performance as the 512 shaders at 1750 MHz in Renoir assuming no architectural improvements of any other kind. Of course you wouldn't stray far from that target if there's some other bottleneck in place. Cezanne has even higher clock speeds, but I wonder how neatly those line up with additional memory bandwidth due to supporting faster RAM. Napkin math puts the top-end Cezanne at 10% above Raven Ridge assuming no changes beyond clock speeds.

I think the more interesting parts are at the bottom end, where Cezanne and Lucienne have twice as many shaders as the low-end of Raven Ridge and the clock speed has increased by 25% to 60% depending on what you're comparing. Even without architectural improvements that puts the low-end performance that these parts will offer really close to top-end performance that Raven Ridge parts offered. That's a pretty good jump considering Raven Ridge is only around 3 years old.

LightningZ71 · Jan 25, 2021

Comparing the 8CU having 3500u to the 6CU having 5300/5400u, both with four cores with hyper threading, one with the same L3, it's not a night and day difference. Yes, the CPU cores themselves are markedly better, but the iGPU effective throughput is very similar, with a slight node to the 3500u in some areas. Go out on YouTube and look at some gaming benchmark runs for both the 3500u and the lowest end Renoir products out there, which will be very similar to Lucienne, and you'll see how close they are.

If you are referencing the dual core/3CU Raven Ridge 2 based products, you're really in a whole different class of chip.

As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.

I personally believe that they would do better with a move to a 2 layer POP package with an HBM die and a CCD mounted on the common I/O substrate die with an external DDR5 connection. The HBM could be active for the iGPU when selected or it senses wall power, and it could be inactive for mobile use. Lower SKUs could not include the HBM. The lower die could have a generous L3 or L4 cache if they wanted. They could also use an N5 CCD on an N7 I/O die that has most of its area taken up with an infinity Cache as well and just concentrate on having to rely on a dual (quad sub channel) channel DDR5 implementation. It would give them nearly the ram throughput of the rx560, but with the infinity Cache, it would perform more like a 570/5500m. But, that's just wishful thinking. It would still be far too expensive to do that in the near term.

dr1337 · Jan 25, 2021

LightningZ71 said:
As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.

I mean cezanne is only 15% bigger than renoir despite it having double the l3 cache and slightly bigger cores. I don't think tacking on another 16mb for the igpu will be that big of a deal especially should they move to a smaller node. Especially if it can provide a substantial increase to performance/watt. Granted its not just as simple as "tacking on cache", but with how well LLC has worked for rdna2, it just seems to be expected to me for a next generation APU

gdansk · Jan 25, 2021

I'd be very surprised if they do any HBM + RDNA2 APU. Such a configuration seems to be in conflict with the RDNA2 design goals. If they were going to do such a thing, I would have expected it for the consoles (which may not be using infinity cache? do we know?) That neither did suggest it is too costly.

uzzi38 · Jan 26, 2021

LightningZ71 said:
As for us seeing a large Infinity Cache on APUs, remember, these are still value products. L3 SRAM does not scale well as you move down from N7, and going for a giant L3 on an APU, which already has a large fraction of its die dedicated to poorly scaling I/O modules, in a highly price competitive market on a very expensive per square mm node is not exactly a great idea.

Giant L3? Even 16MB would drastically improve mem bw for an iGPU. Compared to Navi22 (most likely 384GB/s) for example, your standard iGPU with DDR4-3200 dual channel is what, like 50GB/s?

Combine that with RDNA's improvements to colour compression and some LPDDR5-5500 you have the makings for a very competent iGPU without the need of drastically shooting up cost with HBM2.

You're effectively looking at ~85GB/s memory bandwidth (~2/9 Navi22) with either 1/6 or 1/4 the Infinity Cache depending on what N22 is (64/96MB). Lets just say N22 ends up at 3060Ti perf, you're effectively looking at 1/4 that (especially with 12CUs on Rembrandt). That's GTX1050Ti tier performance. Roughly 60-70% over MX450 (based on this review that shows MX450 as 75% of a 1050)

Gaming Test GeForce MX450: 10 Game di HP Envy 13 • Jagat Review

Kali ini tim Jagat Review kembali kedatangan laptop terbaru dari HP yaitu Envy 13. Laptop tipis ini hadir dengan spesifikasi cukup tinggi yaitu berbekal

www.jagatreview.com

jpiniero · Jan 26, 2021

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

AT has a review of a Cezanne laptop. It's decently better in ST compared to Renoir and even ever so slightly faster than Tiger Lake; but the MT is not that much better than Renoir.

leoneazzurro · Jan 26, 2021

MT is heavily thermal dependent.

LightningZ71 · Jan 26, 2021

We've been seeing the MT performance issues in the various leaks for weeks now. I don't know if they are more related to thermals, memory bandwidth starvation, the new 8 core CCX with a smaller L3 than desktop, or something else. I'm more inclined to agree with the above post that it's thermals.

tamz_msc · Jan 26, 2021

LightningZ71 said:
We've been seeing the MT performance issues in the various leaks for weeks now. I don't know if they are more related to thermals, memory bandwidth starvation, the new 8 core CCX with a smaller L3 than desktop, or something else. I'm more inclined to agree with the above post that it's thermals.

It's not thermals, it's the power limit. 35W is simply not enough for 8 cores.

Panino Manino · Jan 26, 2021

A bit of a disappointment, and looks like Vega really reached it's limit.

coercitiv · Jan 26, 2021

coercitiv said:
Also, I would not be quick to attribute improved battery life to the Zen 3 silicon alone. Laptop CPUs can get a nice bump in battery life from multiple sources, not all of them being directly linked to core performance. LCN-U may still surprise us, as far as a Zen 2 based SKU can anyway.

As already suspected from the AMD slides, Lucienne is not simply a Renoir rebadge, as it does come with power management improvements.

So while yes it is the same silicon layout and floorplan, some of these features weren’t possible in Renoir. AMD built in these features perhaps knowing that they couldn’t be enabled in Renoir, but sufficient changes and improvements at the manufacturing stage and firmware stage were made such that these features were enabled in Lucienne. More often than not these ideas often have very strict time windows to implement, and even if they are designed in the hardware, there is a strict cut-off point by which time if it doesn’t work as intended, it doesn’t get enabled. Obviously the best result is to have everything work on time, but building CPUs is harder than we realize.

amrnuke · Jan 26, 2021

tamz_msc said:
It's not thermals, it's the power limit. 35W is simply not enough for 8 cores.

And even then, it's only a hair off the stock 5600X in rendering any many other MT workloads.
And importantly, in ST workloads, it's highly competitive with the top of the line 1185G7.

I'm in the market for a new laptop, but my inclination is to wait for evaluation of one of the 5600 U/H/HS chips. Most of my laptop work is going to be lightly threaded, though, and the single-core boost gimp might not be great. Lopping 15% of performance right off the top...

soresu · Jan 26, 2021

Panino Manino said:
A bit of a disappointment, and looks like Vega really reached it's limit.

Nevermind, we'll soon see mobile RDNA2 in action on Van Gogh.

Hopefully RDNA2 doesn't persist and we will see a more regular GPU uArch cadence for APUs in the future.

tamz_msc · Jan 26, 2021

amrnuke said:
And even then, it's only a hair off the stock 5600X in rendering any many other MT workloads.
And importantly, in ST workloads, it's highly competitive with the top of the line 1185G7.

I'm in the market for a new laptop, but my inclination is to wait for evaluation of one of the 5600 U/H/HS chips. Most of my laptop work is going to be lightly threaded, though, and the single-core boost gimp might not be great. Lopping 15% of performance right off the top...

Well it depends on the workload. A core in the 5600X would after all consume 2.5x the power of a core in a 5980HS for a difference of 1.2GHz in clock speed, but in the end six faster clocked cores will win out over eight slower clocked cores especially when the difference in clock speed is so huge.

yuri69 · Jan 26, 2021

I guess Milan will be a similar story to Cezanne - high low thread gains, low full 64c. The power/thermals and IO limits are here to stay.

uzzi38 · Jan 26, 2021

soresu said:
Nevermind, we'll soon see mobile RDNA2 in action on Van Gogh.

Hopefully RDNA2 doesn't persist and we will see a more regular GPU uArch cadence for APUs in the future.

If we ever see Van Gogh. Afaik MS aren't using it any more for whatever reason.

We may be waiting for Rembrandt for the full platform upgrade (except CPU) all in one go at this rate sadly.

jpiniero · Jan 26, 2021

I thought Van Gogh was more for embedded.

moinmoin · Jan 26, 2021

jpiniero said:
I thought Van Gogh was more for embedded.

If it's a semi custom design financed by Microsoft and Microsoft ended up not wanting to mass produce it, AMD may have to pay back some of the cost Microsoft covered if it were to sell it instead.

Hitman928 · Jan 26, 2021

The 15W results look to me more impressive than the 35W results. The 5980HS at 15W in many instances is able to match or beat the 4900H at 35W. There are some anomalous results for the 5980HS where it shows very low performance, but I'm guessing that's down to the laptop configuration. It will be interesting to see what the 'u' variants can do on laptops tuned for 15W performance.

HurleyBird · Jan 26, 2021

Hitman928 said:
There are some anomalous results for the 5980HS where it shows very low performance.

While the Zephrus was impressively thin, the Flow X13 is even more impressively thin.

Discussion AMD Cezanne/Zen 3 APU Speculation and Discussion

Senior member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Lifer

Golden Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Lifer

Diamond Member

Diamond Member

Platinum Member