Speculation: Ryzen 4000 series/Zen 3

Page 100 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Platinum Member
Feb 6, 2011
2,004
1,276
136
15% IPC uplift for Zen3 is way too small for brand new uarch.
Alder Lake in H1 2021 with 40%IPC jump (Golden Cove) can bring rough ride for AMD.
If its 15% higher IPC , but consumes less energy while also clocking higher then all of a sudden 15% high IPC looks amazing. See this is your problem you pick the numbers you think are important and then hyperfocus on them at the expense of the bigger picture.

by your logic both cannon and ice lake should be a problem for AMD right now , and yet.....................
 

DisEnchantment

Senior member
Mar 3, 2017
531
936
106
From TSMC's earnings report.

N5 in volume production good yield. Full node jump +80% logic density, +20% speed. Extensive EUV, expect fast/smooth ramp in 2H20 driven by Mobile+HPC. Reiterate N5 will be 10% of wafer revenue in 2020


Bodes well for AMD. If they can get a pipe cleaner asap they will be fine with the node migration.
 

Thunder 57

Golden Member
Aug 19, 2007
1,409
1,184
136
...I've been on mobile all morning, I'd completely forgotten the forums do what they do with Reddit. I'll fix it up now
I could've been nicer but it was early and I was tired. Thank you though, certainly cleaner now.

15% IPC uplift for Zen3 is way too small for brand new uarch.
Alder Lake in H1 2021 with 40%IPC jump (Golden Cove) can bring rough ride for AMD.
Who let you back in? It's not a brand new uarch. It is very much Zen based. But who cares, an iphone will beat it anyway, right?
 

Hans Gruber

Senior member
Dec 23, 2006
893
236
116
I will write it one more time. We can come back in 4 or 5 months and see if the leaks were correct. The enhanced 7nm process didn't give AMD the boost they were hoping for like going from 12nm to 7nm. But Zen 3 is new architecture over Zen 2. 15% IPC gain is on the low end of predictions. I heard 20%. I am guessing a 100mhz increase on boost clocks vs. zen2. I think that will be due to a mature 7nm node production improvements rather than the 7nm+ they were hyping. Think about intel on the 14nm node. They got really good and enhancing the 14nm to take massive amounts of voltage for improvements.

Something people have not talked about here. The infinity fabric on the memory controller for Zen3. As most know 3800mhz on the Zen2 platform is sketchy at best no matter the motherboard. If Zen 3 scales up to 4000mhz or beyond with coupled memory clock and fabric clocks. That will be provide a significant improvement in performance.
 
  • Like
Reactions: Drazick

Veradun

Senior member
Jul 29, 2016
407
321
106
Holy heck. AMD already gearing up for N5??? The only N5 part on their roadmap is Zen 4... this would put them way ahead of their roadmap, right?

I thought for sure AMD would wait out the iPhone launch to begin N5. Maybe Huawei cutting orders opened the door for them to be a little more aggressive?

EDIT: Here's this for everyone else who doesn't speak Chinese. I don't know if their translation is accurate either. I think they're implying AMD has a special N5 process this year, different from N5P next year, that Apple is also using.
What if AMD is using the same process as Apple because it is the same silicon we are talking about? :>

. (Also I learned that Arcturus apparently is not an architecture, like Navi/Vega/Polaris, but rather is just a specific chip?)
What I heard is Arcturus is a SKU. May be wrong ofc.
 
Last edited:

inf64

Platinum Member
Mar 11, 2011
2,970
1,446
136
15% IPC uplift for Zen3 is way too small for brand new uarch.
Alder Lake in H1 2021 with 40%IPC jump (Golden Cove) can bring rough ride for AMD.
First of all you are missing the + sign.
Second of all, good luck seeing AL in 2021. Maybe H2 2022 if all goes well.
Third , 40%(???) IPC jump is a pipe dream. If it is 10-15% faster than Willow Cove then it is awesome, but let's see about that.
 

Richie Rich

Senior member
Jul 28, 2019
438
200
76
I will write it one more time. We can come back in 4 or 5 months and see if the leaks were correct. The enhanced 7nm process didn't give AMD the boost they were hoping for like going from 12nm to 7nm. But Zen 3 is new architecture over Zen 2. 15% IPC gain is on the low end of predictions. I heard 20%. I am guessing a 100mhz increase on boost clocks vs. zen2. I think that will be due to a mature 7nm node production improvements rather than the 7nm+ they were hyping. Think about intel on the 14nm node. They got really good and enhancing the 14nm to take massive amounts of voltage for improvements.
I agree, completely new uarch like Zen3 is expected to have 20%+ over Zen2. However from this reason there won't be any higher boost clocks in ST than Zen2. On the other hand we can see higher clocks in MT thanks to higher efficiency of new uarch and better N7P node. Overall performance will increase but desktop clock hunters might be disappointed from Zen3.

AMD focus more on server market where no CPU clocks over 4GHz. Additional 300MHz in all core boost for EPYC will bring much more profit for AMD than reaching magical 5GHz for one desktop SKU.

I see nobody is interested in 3x lager microcode space for Zen3. This can mean that Zen3 is really big change in architecture. Something like Keller's wider K12 reworked for x86.
 
Last edited:

soresu

Golden Member
Dec 19, 2014
1,218
459
136
What I heard is Arcturus is a SKU. May be wrong ofc.
Arcturus is almost certainly CDNA1 - whatever SKU variants for yield and CU cutdown there may be (and let's face it that is how they split the market and maximise return on design costs) we haven't heard anything to imply more than one specific die design.
 
Last edited:
  • Like
Reactions: Saylick

soresu

Golden Member
Dec 19, 2014
1,218
459
136
N2 which is supposed to be N3 with nanosheets is supposedly 2023. Which will get its own fab in Hsinchu.
I heard TSMC were doing both finFET's and nanosheet/MBCFET's at N3 - ala N7/N7+ with the w/o EUV and w/EUV variants.
Also, hopefully 52 CU -> 36 CU -> 8 CU with 5nm FinFETs will lead to >2 GH base gpu clock rate and ~3 GHz boost gpu clock rate.
3 ghz even for boost clock is a pipe dream, it took them over a decade to go from 1 to 2 ghz and you think they will make it to 3 ghz in a year or 2??!!

Certainly not at either N5 or N5P processes.

Also what's the 52>36>8 CU thing about?
But, it would be not surprising if Arcturus turns out to be CDNA at some point in the near future. This chip has so many new things compared to Vega 20 with over 8 months of commit history in LLVM and amdgpu. Just a reminder, ROCm compiler is forked from LLVM.
Definitely Arcturus is the CDNA chip, it's the only thing it could be at this point.
For the single Frontier machine, assuming CDNA is over 20 TF/Card and using Vega 20 as reference, at 5nm with 0.1 defects/mm2, if 90% of the compute power comes from GPU that is like ~750 wafers plus another few hundreds for the CPUs. So it is going to be much lower than the total 12k wpm they requested.
Remember they made a point of saying that the gutted rasterization logic was being replaced with tensor/ML acceleration logic - so I would expect TOPS to be at least as important, if not more so than TFLOPS for CDNA.
 

uzzi38

Senior member
Oct 16, 2019
815
962
96
CDNA does debut with Arcturus.

It's called CDNA because Arcturus does a lot more stuff than GCN could. That being said, I won't claim to know specifics (excluding tensors and bfloat16), just that it's a big die that doesn't line up with how large it should actually be given the CU count. Even if you add to the rumoured die size for tensor cores, it's still too large to be honest... plus then the TOPs figure doesn't make sense either.

Arcturus and CDNA are a bit mysterious, to say the least.
 
  • Like
Reactions: Glo.

DisEnchantment

Senior member
Mar 3, 2017
531
936
106
CDNA does debut with Arcturus.

It's called CDNA because Arcturus does a lot more stuff than GCN could. That being said, I won't claim to know specifics (excluding tensors and bfloat16), just that it's a big die that doesn't line up with how large it should actually be given the CU count. Even if you add to the rumoured die size for tensor cores, it's still too large to be honest... plus then the TOPs figure doesn't make sense either.

Arcturus and CDNA are a bit mysterious, to say the least.
Similar to Navi12
- Mixed precision unsigned/int 4/8/16/32 bit operations
- Mixed precision float 16/32 bit operations

Unique to Arcturus
- ECC
- 1/2 DPFP
- XGMI networked GPU workshare
- VCN 2.5 (instead of UVD/VCE)
- Additional RAS features
- Fused operations with matrix operations using the AGPRs (MFMA), something like [ A ] x [ B ] + [ C ], A whole lot of GPRs were added to supported this.

This is what can be found so far from LLVM and amdgpu/amdkfd. Some additional things might pop up, but I have some doubt.
 

soresu

Golden Member
Dec 19, 2014
1,218
459
136
CDNA does debut with Arcturus.

It's called CDNA because Arcturus does a lot more stuff than GCN could. That being said, I won't claim to know specifics (excluding tensors and bfloat16), just that it's a big die that doesn't line up with how large it should actually be given the CU count. Even if you add to the rumoured die size for tensor cores, it's still too large to be honest... plus then the TOPs figure doesn't make sense either.

Arcturus and CDNA are a bit mysterious, to say the least.
Remember TOPS can be a product of any possible calculation width variant - ie INT2, INT4, INT8, etc......

So the figure could be a lot higher than you might expect compared to the FP32 FLOPS spec, especially considering that with added tensor units it might not even be an exact multiple of the FLOPS count as it has been on AMD cards so far.
 

uzzi38

Senior member
Oct 16, 2019
815
962
96
Remember TOPS can be a product of any possible calculation width variant - ie INT2, INT4, INT8, etc......

So the figure could be a lot higher than you might expect compared to the FP32 FLOPS spec, especially considering that with added tensor units it might not even be an exact multiple of the FLOPS count as it has been on AMD cards so far.
If anything, the number is lower than I expected. MI60 was called MI60 because it could output just under 60TOPs in INT8.

Arcturus is MI100, so unless they changed the naming scheme, should be capable of getting 100TOPs in INT8 just off shaders alone. If they actually do have tensors, theyshould be able to go further than that. Unless they either actually don't have tensor cores or aren't including tensor cores in the name for some weird reason
 

soresu

Golden Member
Dec 19, 2014
1,218
459
136
If anything, the number is lower than I expected. MI60 was called MI60 because it could output just under 60TOPs in INT8.

Arcturus is MI100, so unless they changed the naming scheme, should be capable of getting 100TOPs in INT8 just off shaders alone. If they actually do have tensors, theyshould be able to go further than that. Unless they either actually don't have tensor cores or aren't including tensor cores in the name for some weird reason
I'd be inclined to say they called it MI100 for marketing continuity rather than TOPS count - the MIxxx nomenclature is still fairly fresh at this point, so they probably don't want to go and do a completely new one, especially given enterprise customers are more averse to change than regular consumers.

Though a 1.66x increase in TOPS on near the same node is nothing to sniff at, like RDNA1 it could just be the pipe cleaner for the new uArch - after all the financial analyst day seemed to make much more of CDNA2 than CDNA1, an odd move even given CDNA2's placement in a coming supercomputer.

edit:
or aren't including tensor cores in the name for some weird reason
Definitely a valid reason if they don't want big green knowing the number ahead of time - like the intersections/second count on RDNA2, it's good to keep some surprises until you are more prepared, at the moment AMD are still playing a perpetual catchup game with their ROCm software stack, which still lacks even Windows support after all this time.

We should probably take this elsewhere if further discussion is warranted.
 
Last edited:

uzzi38

Senior member
Oct 16, 2019
815
962
96
I'd be inclined to say they called it MI100 for marketing continuity rather than TOPS count - the MIxxx nomenclature is still fairly fresh at this point, so they probably don't want to go and do a completely new one, especially given enterprise customers are more averse to change than regular consumers.

Though a 1.66x increase in TOPS on near the same node is nothing to sniff at, like RDNA1 it could just be the pipe cleaner for the new uArch - after all the financial analyst day seemed to make much more of CDNA2 than CDNA1, an odd move even given CDNA2's placement in a coming supercomputer.
Because CDNA2 isn't far away at all, whereas CDNA1 is late and probably hasn't garnered much interest. Also, I imagine they realise they'll probably have the compuete ecosystem in a better place next year as well, and if they want people to hop on, it'll be then.

Good point on the first half though. Also, CDNA1 is most certainly a pipe cleaner, as CDNA2 is MI200:
 

Cardyak

Member
Sep 12, 2018
27
18
41
I will write it one more time. We can come back in 4 or 5 months and see if the leaks were correct. The enhanced 7nm process didn't give AMD the boost they were hoping for like going from 12nm to 7nm. But Zen 3 is new architecture over Zen 2. 15% IPC gain is on the low end of predictions. I heard 20%. I am guessing a 100mhz increase on boost clocks vs. zen2. I think that will be due to a mature 7nm node production improvements rather than the 7nm+ they were hyping. Think about intel on the 14nm node. They got really good and enhancing the 14nm to take massive amounts of voltage for improvements.

Something people have not talked about here. The infinity fabric on the memory controller for Zen3. As most know 3800mhz on the Zen2 platform is sketchy at best no matter the motherboard. If Zen 3 scales up to 4000mhz or beyond with coupled memory clock and fabric clocks. That will be provide a significant improvement in performance.
From information I've garnered IPC increases for Zen 3 could be both within the region of 15% and 20% at the same time.

Single threaded IPC could increase by around 15%, but multi-threaded closer to ~20%

This can happen for a variety of reasons:
  • Merged L3 cache allows greater cross communication between cores
  • Infinity Fabric improvements
  • Having a wider core may result in more SMT utilisation
These changes could result in a greater uplift in IPC when there are multiple threads vying for memory access and competing for execution units in the back end.
 

soresu

Golden Member
Dec 19, 2014
1,218
459
136
From information I've garnered IPC increases for Zen 3 could be both within the region of 15% and 20% at the same time.

Single threaded IPC could increase by around 15%, but multi-threaded closer to ~20%

This can happen for a variety of reasons:
  • Merged L3 cache allows greater cross communication between cores
  • Infinity Fabric improvements
  • Having a wider core may result in more SMT utilisation
These changes could result in a greater uplift in IPC when there are multiple threads vying for memory access and competing for execution units in the back end.
I would also expect packaging improvements allowing for more connections between CCD's, I/O dies and the interposer - this will likely help on the IF side of things, allowing greater width without ramping up frequency too much.
 

DisEnchantment

Senior member
Mar 3, 2017
531
936
106
[Sorry for the bit OT now on the Zen3 thread]

Keep in mind that the key customers LLNL/ORNL are not so interested in TOPS, they want DPFP performance.

“Our workloads are primarily not deep learning models, although we are exploring something we call cognitive simulation, which brings deep learning and other AI models to bear on our workloads by evaluating how they can accelerate our simulations and how they can also improve their accuracy and find where they actually work,” explained de Supinski.
El Capitan, for example, is targeted to have 2 exaflops of DPFP.

The El Capitan system will have in excess of 2 exaflops of peak double precision performance

AMD is lucky to have won the two contracts for Frontier and El Capitan. It allows them a lot of Flexibility in designing CDNA. They have a captive market to deliver these products with the development paid for and the Software Development paid for to some extent. On top of that Scientists participating in any of the US establishments LLNL/ORNL etc will contribute actively to ROCm (stated in AMD's own page for Frontier/El Capitan).
The government researchers have made a complete roadmap for the replacement of CUDA with elements proposed by AMD but mainly centering around OpenMP. (But for the life of me I cannot find the link again)

That said...
I wouldn't assume that CDNA1 is going to be a trivial upgrade over MI60. I doubt that just scaling up TOPS would be such a big challenge for AMD.
Just doing packed int4 will make MI60 go above 100TOPS without doing anything, then consider more CUs. With MFMA they can chain multiple matrix operations in a single wave. If they can pack mixed precision in there too, the gain is really incredible.
The main kernel work for Arcturus has been centered around networking GPUs to achieve the first step in workload sharing, data coherency between the GPUs.
So I think that was always the main focus. 2nd Gen Infinity Architecture.

1587137662119.png
 
Last edited:

Veradun

Senior member
Jul 29, 2016
407
321
106
I would also expect packaging improvements allowing for more connections between CCD's, I/O dies and the interposer - this will likely help on the IF side of things, allowing greater width without ramping up frequency too much.
You mean wider IF?
 

eek2121

Senior member
Aug 2, 2005
437
315
136
With AMD’s current trends, I can pretty much guarantee you that there will be significant clock gains across the board. The 15-20% IPC increase has no basis in reality from what I understand, but it is a safe assumption.

Regarding clocks: I don’t know if we’ll see the magical 5GHz number. However, Many Ryzen 3000 chips could hit 4.3-4.4GHz when overclocked. The top 0.1% could go even higher. I imagine that number will be pushed to around 4.6Ghz on Zen 3. Non-overclocked base clocks should see a similar 200MHz jump, and. single/low core boost should see a significant jump upwards.

An extra 200 MHz with a 15% IPC boost means Zen 3 will be extremely potent.

The L3 cache change alone will provide drastic performance improvements. I would be surprised if the difference is only “+15%”.
 

uzzi38

Senior member
Oct 16, 2019
815
962
96
With AMD’s current trends, I can pretty much guarantee you that there will be significant clock gains across the board. The 15-20% IPC increase has no basis in reality from what I understand, but it is a safe assumption.

Regarding clocks: I don’t know if we’ll see the magical 5GHz number. However, Many Ryzen 3000 chips could hit 4.3-4.4GHz when overclocked. The top 0.1% could go even higher. I imagine that number will be pushed to around 4.6Ghz on Zen 3. Non-overclocked base clocks should see a similar 200MHz jump, and. single/low core boost should see a significant jump upwards.

An extra 200 MHz with a 15% IPC boost means Zen 3 will be extremely potent.

The L3 cache change alone will provide drastic performance improvements. I would be surprised if the difference is only “+15%”.
No, yes, no, probably, no, depends, probably in that order would be my reply to that.
 

eek2121

Senior member
Aug 2, 2005
437
315
136
No, yes, no, probably, no, depends, probably in that order would be my reply to that.
This page will be the one to use for reference as Zen 3 leaks start to roll out: https://www.anandtech.com/show/15708/amds-mobile-revival-redefining-the-notebook-business-with-the-ryzen-9-4900hs-a-review/2

Regarding L3 cache, Zen is particularly sensitive to cache increases due to high latency. Moving to a unified cache means that cache per core count is effectively doubled, depending on how the cache is configured.

Regarding clocks, there is a definite increase in perf/watt between Renoir and Zen 2 parts. We can clearly see this by comparing Zen 2 based parts and Renoir, though the real juicy comparison won't come until U-series parts land. I could dig a bit deeper into some other, more interesting things that'd popped up, but I have to get back to work. :)

It's all speculation for now, at any rate. Zen 3 is going to be a pleasant surprise for a lot of people.
 

DisEnchantment

Senior member
Mar 3, 2017
531
936
106
Merged L3 can decrease latency for what was before inter CCX latency but can bring in higher overall access latency to L3 for what was before intra CCX L3 latency if they doubled it. They need to address this somehow.

Bigger issue with IF is power usage which affects EPYC a lot. I suppose if they improve IF energy usage they could run EPYC at much higher frequencies that what is possible today, which is a shame because EPYC is running within the sweet spot of the N7 HD VF curve. They need to address this as there is a lost opportunity here. Going faster or wider for IF is not always better.

Zen execution unit is already very good, the higher SMT utilization was because the executing thread is stalling due to misprediction or data not present in the cache leading to a different thread being executed instead.

My hope is that besides the Uncore/L3 and IF improvements they can put some effort in other places too. I think the merged L3 per CCD was probably a requirement to the X3D chiplet NoC architecture because the crossbar will be connected to the uncore (from the patents at least) in the future.
 
Last edited:
  • Like
Reactions: uzzi38

ASK THE COMMUNITY