Question AMD Rembrandt/Zen 3+ APU Speculation and Discussion

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

izaic3

Member
Nov 19, 2019
61
96
61
Alright, so we've had some leaks so far. I don't know if any of it's been confirmed yet, as it's pretty early, but here is what I've surmised so far (massive grain of salt of course):

If if turns out to have RDNA 2 and 12 CU, I could see iGPU performance potentially almost doubling over Cezanne.

If I've made any mistakes or gotten anything wrong, please let me know. I'd also love to hear more knowledgeable people weigh in on their expectations.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
I was actually comparing it to the 6800/6800XT.

My point is they still had to cut down the CUs to get that area. Vega 20 is at 330mm2 but grow to 520mm2 with Navi 21. 1.5x SP of Vega 20 will push it to the 600mm2 range.

Going from Vega 8 to Navi 12 will double the iGPU size as well.
90CU Navi(1.5x SP of Vega20) wouldn't be in 600mm2 range, +10CU don't take up extra 80mm2. :)
Navi 21 has 128MB IC and 256bit GDDR6, which take a lot more space than HBM2, that's why Navi is so much bigger, of course not just because of them.
RDNA1 CU is ~2.1 mm2 and I don't expect RDNA2 CU to be more than 2.5 mm2 in the worst case.

It's hard to tell If Rembrandt IGP will be really 2x bigger than the one in Cezanne, I didn't even see anyone measuring just the IGP in Renoir or Cezanne to begin with.

BTW does It even matter If It's really 2x bigger? 12CU RDNA2 would be better than putting for example 16CU Vega, which would need to have lowered clocks to fit into the same TDP as 8CU Vega. RDNA2 is way more power efficient compared to Vega.
 
Last edited:
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Going from Vega 8 to Navi 12 will double the iGPU size as well.

It really depends on how they adapt the technology for mobile. I suspect any hardware that accelerates ray tracing gets thrown out to save space. There's no point in adding it to an APU. Similarly, the infinity cache found on RDNA2 chips may or may not show up, and that represents an even large number of transistors for something that might get cut.

AMD may not even go to 12 CUs because unless there's enough bandwidth to feed all of them, there's no point in making a wider design outside of redundancy.
 
  • Like
Reactions: Tlh97

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
AMD may not even go to 12 CUs because unless there's enough bandwidth to feed all of them, there's no point in making a wider design outside of redundancy.

If AMD chooses to go 12CU it is because they can feed them, Vega 11 runs well at DDR4-3200, but it becomes heavy memory limited when OCed to something like 1700mhz.

You can also see why Renoir never went to DIY there, try to sell that at $160 and reviews would have backfired badly on AMD, Renoir is just not up to the task. This is why i dont belive it was due to supply.

It really pisses my off to think that we could have had RX560 perf in a IGP right now whiout that downgrade.

Anyway, im confident that if it is 12CU it is because they expect to work well with the avalible bandwidth.
 
  • Like
Reactions: Mopetar

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
They wouldn't sacrifice die space and transistors for a bigger IGP, If the performance improvement would end up subpar, because they can't properly feed It.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I read somewhere that the truly next gen APUs will have SLC instead of IC. The SLC is common between CPU/GPU and big enough to provide the BW amplification for the small number of CUs
 
  • Like
Reactions: lightmanek

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,571
146
I read somewhere that the truly next gen APUs will have SLC instead of IC. The SLC is common between CPU/GPU and big enough to provide the BW amplification for the small number of CUs
Hmm, how would that work? A seperate cache that hangs off the SDF on it's own, with the CPU and GPU cores accessing it via the SDF?
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I read somewhere that the truly next gen APUs will have SLC instead of IC. The SLC is common between CPU/GPU and big enough to provide the BW amplification for the small number of CUs
I can imagine the Zen cores' L3$ and RDNA2's IC to amalgamate into something like a SLC situated on interposers. Though I'm not sure if the packaging cost is economical for relative low budget mainstream chips yet.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Hmm, how would that work? A seperate cache that hangs off the SDF on it's own, with the CPU and GPU cores accessing it via the SDF?
I can imagine the Zen cores' L3$ and RDNA2's IC to amalgamate into something like a SLC situated on interposers. Though I'm not sure if the packaging cost is economical for relative low budget mainstream chips yet.
Could also be more mundane like an L4 where the CPU L3 and GPU L2 can probe the SLC. Zen3 L3 can probe other L3s like in 5950X for example, and RDNA2 IC basically an LLC.
The SLC sits before IMC. They need to bring the two together now since the pieces are there..
Using SDF in a mobile part could be not so great from power perspective, so for this part maybe a monolithic part without the SerDes.
Also Trento with HBM is using it as LLC I suppose, backed up by main memory. So this is something which they have already in some form or the other.
 
  • Like
Reactions: lightmanek

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Using SDF in a mobile part could be not so great from power perspective, so for this part maybe a monolithic part without the SerDes.
I'm pretty sure SDF and SCF are used even on monolithic APUs and don't depend on the package being an MCM or some such. Power usage is usually an issue with high bandwidth low latency interfaces as well as connection wires, so IMC and SerDes etc.
 
  • Like
Reactions: Tlh97 and uzzi38

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I'm pretty sure SDF and SCF are used even on monolithic APUs and don't depend on the package being an MCM or some such. Power usage is usually an issue with high bandwidth low latency interfaces as well as connection wires, so IMC and SerDes etc.
Within the CCD IFOP/SDF is not used for probing caches as far as I can tell at least for Zen3. In 5950X IFOP is used for inter L3 probe, across dies. But I am curious to know if there is somewhere being mentioned of this.
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,571
146
Within the CCD IFOP/SDF is not used for probing caches as far as I can tell at least for Zen3. In 5950X IFOP is used for inter L3 probe, across dies. But I am curious to know if there is somewhere being mentioned of this.
Here's the diagram for Raven Ridge:

1618325315096.png

And here's the one for Ryzen 1k and 2k.

1618325357087.png



Inter-CCX communications all go through the SDF, regardless of chiplet layout. I can't see a way of an SLC working unless it also hangs off the SDF, but it would come at a significant cost to latency.
 
  • Like
Reactions: Tlh97 and moinmoin

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
System Level Infinity Cache is technically possible. It has been theorized before with AMD's older L3 directory caches.

HSC.png
Heterogeneous System Coherence for Integrated CPU-GPU Systems

HSC2.png
Software Assisted Hardware Cache Coherence for Heterogeneous Processors; AMD Research - Advanced Micro Devices, Inc.

CPU-GPU would have to operate within the same CCM. Basically meaning CPU L3/GPU L3 will have to be within the same area. With the current CCX evolving from 8-core to 8-core+GPU. To basically match previous research papers and would be a large jump from what we currently have.

1-core => 2 MB L3
3-compute units => 2 MB L3
8+4 set => 16 MB CPU-side CCX + 8 MB GPU-side CCX. // Lopsided integrated of L3.
or
3-compute units => 2x2 MB L3(added bandwidth and capacity)
8+8 set => 16 MB CPU-side CCX + 16 MB GPU-side CCX. // Equal integrated of L3.

Just to show the awkward-ness of such a design:
awkwardconfig.png

More accurate-style:
awkwardconfig.png

Forgot the dual-CU-ness of the WGP... UGH!!!:
awkwardconfig.png

This one is more easy to swallow, in design, it might work out?! It at least matches the design aesthetic of Renoir/Cezanne. As well as sharing L3 with the CPU cores like the research papers above.

Looking through the patents the above would require two coherency managers. "Exclusive" for CPU and "Main" for CPU+GPU, where main is the original manager and exclusive is to separate exclusive CPU coherency from system GPU coherency. Which allows for CPU coherency to maintain its bandwidth even with the extra-GPU part in the main CCM.

Exclusive CCM => 8x2 MB cache coherency (Data processed only on CPUs)
Main CCM => 16x2 MB cache coherency (Data processed on both)

The above is also needed for CPU+GPU chiplets within the same package. CPUs have main and exclusive cache coherency managers, GPUs only have main cache coherency managers. This is also a requirement for 3rd generation infinity fabric architecture, ex; 2x EPYC CPUs(Exclusive(CPU) and Main(CPU+GPU))+8x RI GPUs(Main-only)
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,835
1,514
136
My personal opinion is that AMD is going GPU chiplets way, that likely means some IC on chiplet as it would be the GPU "L3". They may end up doing monolothic APUs only for notebooks, embedded and the low end.

Things will become more clear as we get more info on RDNA3.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
It really depends on the process and packaging techniques. If the APU is small enough, it makes sense to stay with a monolithic die. There are plenty of ways to bin them and a single die can cover the entire product stack.

The chiplet approach gets expensive if you can't use the same chiplets that other products get or if the cost of connecting it all using an interposer or some other technique is expensive. Doing something like that for a server chip is easy to justify, but not so much when it's for a low-end laptop part. Even more so if a lot of the cost is largely fixed, because the server parts are going to all use multiple chiplets, but the APU probably just uses one of each type.
 
  • Like
Reactions: soresu

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
There is a hole in AMD'S roadmap for a quad core, 4 CU, small desktop CPU. A well clocked 4/8 Zen2 cpu CCX with 4 MB L3 and 4 Vega or RDNA2 CUs would do just fine on the market while giving excellent volumes of die per wafer.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
There is a hole in AMD'S roadmap for a quad core, 4 CU, small desktop CPU. A well clocked 4/8 Zen2 cpu CCX with 4 MB L3 and 4 Vega or RDNA2 CUs would do just fine on the market while giving excellent volumes of die per wafer.
I feel like the Samsung RDNA partnership might preceed a return to lower power chips.

Perhaps not Cat/*mont style smol cores, but as you say an SoC with a lower core count and a small GPU fit for embedded/fanless environs.

We still don't know much about Van Gogh, it could be targeted at this niche though I doubt it.
 

dr1337

Senior member
May 25, 2020
309
503
106
If that is acurate for Van Gogh, perhaps it is intended to be the successor to Dali/Pollock.
Thats been my assumption, AMD has had a gaping hole in their lineup for almost 3 years now. SFF and NUCs are over 25% of all PC marketshare and while renoir and cezanne are good, they're really overkill for most SFF applications/uses. And if MILD is right, and its only quad core zen 2, the dies should be tiny. Something like van gogh would finally mean top to bottom coverage in the market and thatd be awesome especially from an investor pov ;)
 
  • Like
Reactions: Tlh97

zir_blazer

Golden Member
Jun 6, 2013
1,160
400
136
Thats been my assumption, AMD has had a gaping hole in their lineup for almost 3 years now. SFF and NUCs are over 25% of all PC marketshare and while renoir and cezanne are good, they're really overkill for most SFF applications/uses. And if MILD is right, and its only quad core zen 2, the dies should be tiny. Something like van gogh would finally mean top to bottom coverage in the market and thatd be awesome especially from an investor pov ;)
I see absolutely no reason for Ryzen Embedded to not fit into that segment. Actually, many of the APU dies had more features than when their I/O is castrated to fit AM4, so they have a better feature set in their embedded form.
 
  • Like
Reactions: NTMBK

dr1337

Senior member
May 25, 2020
309
503
106
I see absolutely no reason for Ryzen Embedded to not fit into that segment. Actually, many of the APU dies had more features than when their I/O is castrated to fit AM4, so they have a better feature set in their embedded form.
v2000 would be nice if they had any volume. Its just renoir rebranded to a different segment, availability is very low. Van gogh should be much smaller, especially if its designed for embedded applications and not just a repurposed laptop chip. I really want to be buying AMD nucs and thinclients but they're all more expensive or slower than their intel counterparts.