Speculation: Ryzen 4000 series/Zen 3

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

amd6502

Senior member
Apr 21, 2017
971
360
136
I saw this on another forum. This guy on twitter apparently is listing new AMD patents that belong to Zen 3/RDNA2.

The guy that posted this on the other forum said that Zen 3 will be able to do based on this patent dump:

Zen 3
4x FP Mul+Add

Compared to Zen 2:

Zen2
2xFP Mul+Add + 2x FPadd

Great twitter link, but I don't see how these patents point to 4x FMAC of Zen2 at all.

If they band together cores (BD style) in order to go SMT4 on the FPU side (while remaining SMT2 on integer side) then a maximum rate of 2x FMAC is possible in single thread.
 
  • Like
Reactions: Richie Rich

Richie Rich

Senior member
Jul 28, 2019
470
229
76
There is no realistic setup for a test like this right now. But you will never convince the ARMada that they're comparing bananas to oranges every day, saying the these bananas are much tastier bananas than those oranges. Well, they should be! They are bananas, for God's sake. But they are not very good as oranges and vica versa.
Bananas and oranges is more like CPU and GPU. I don't see any problem to compare performance between two CPUs on different ISAs. If you are company like Amazon and you run your web/SQL servers on for example Linux&MySQL (you have both binary for ARM and x86) then it's very easy for them to compare performance at REAL load. Very easy. They did it and decided create their own server ARM called Graviton. Do you guys really think that Amazon invested huge amount of money into something incomparable? Do you think that people in Amazon don't see that huge +82% IPC advantage delivered by Apple's ARM CPU? Did Apple switched ISA from PowerPC to x86 because it was incomparable?

BTW Mark Papermaster said that AMD will deliver at least 7% IPC jump each gen. Looks nice but In other words we will wait whole decade until AMD will reach IPC level of today Apple cellphones. And because Apple delivers approx +10-15% IPC every year means that AMD and Intel would need to bring +20% every year to catch Apple's IPC performance in one decade. So if completely new uarch like Zen 3 will bring less than +20% IPC I consider it as fail (in long term fight against ARM). If those leaks about 10-12% IPC are true, that's big fail.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
BTW Mark Papermaster said that AMD will deliver at least 7% IPC jump each gen. Looks nice but In other words we will wait whole decade until AMD will reach IPC level of today Apple cellphones. And because Apple delivers approx +10-15% IPC every year means that AMD and Intel would need to bring +20% every year to catch Apple's IPC performance in one decade. So if completely new uarch like Zen 3 will bring less than +20% IPC I consider it as fail (in long term fight against ARM). If those leaks about 10-12% IPC are true, that's big fail.
Increasing IPC at a frequency of 4+GHz is much more difficult than doing the same at 2.5 GHz. If AMD can deliver another 15% IPC with Zen 3 then they would have delivered a 15% increase with each generation for two consecutive years. That's equally impressive as ARM delivering 20-25% each year.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Increasing IPC at a frequency of 4+GHz is much more difficult than doing the same at 2.5 GHz. If AMD can deliver another 15% IPC with Zen 3 then they would have delivered a 15% increase with each generation for two consecutive years. That's equally impressive as ARM delivering 20-25% each year.
  1. - Cortex A77 is equal/faster then SkyLake and Zen (IPC)
  2. - Cortex core is delivering bigger IPC jumps than x86
  3. - ARM is increasing clocks while x86 decreasing
  4. - all above combined together means that overall performance increments are much higher for ARM (Appple A13 literally beats Zen2 at 4.6 GHz already today, just look at the picture bellow)
This trend is scary in long term. Problem is also that Zen 3 will need +20% IPC to beat RocketLake/TigerLake. So IMHO +15% IPC for Zen 3 is double fail (in short term with Intel and in long term with ARM/Apple/Nuvia).

SPEC2006_S865.png
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
  1. - Cortex A77 is equal/faster then SkyLake and Zen (IPC)
  2. - Cortex core is delivering bigger IPC jumps than x86
  3. - ARM is increasing clocks while x86 decreasing
  4. - all above combined together means that overall performance increments are much higher for ARM (Appple A13 literally beats Zen2 at 4.6 GHz already today, just look at the picture bellow)
This trend is scary in long term. Problem is also that Zen 3 will need +20% IPC to beat RocketLake/TigerLake. So IMHO +15% IPC for Zen 3 is double fail (in short term with Intel and in long term with ARM/Apple/Nuvia).

SPEC2006_S865.png
You didn't understand what I said, did you? ARM can deliver 20-25% IPC improvement with each generation precisely because they target such low clocks. If they were designed to operate at over 4 GHz, they won't be able to deliver that much IPC improvement with each generation. With that in mind, it puts into perspective AMD's 15% figure.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
You didn't understand what I said, did you? ARM can deliver 20-25% IPC improvement with each generation precisely because they target such low clocks. If they were designed to operate at over 4 GHz, they won't be able to deliver that much IPC improvement with each generation. With that in mind, it puts into perspective AMD's 15% figure.
I understood what you said. However there is not much things holds your IPC down at high speed uarch:
  • - for high clock uarch you need pipeline with more stages and shorter (to maximize clock at high voltage).
  • - for low power device like Apple A13 you need the same (more stages and shorter, to minimize voltage at reasonable clock ).... so pipeline is very comparable (A13 has 16? stages, Zen2 has 18-19?). Also we know that Cortex A72 can run at 4 GHz, so those Apples would run around 4 GHz if manufactured with same HP process as x86.
  • - ARMs operates at 2.6 GHz, Zen2 at 4.6 GHz. x86 has clocks +77% however this is not huge difference explaining +82% IPC advantage of A13.
  • - the biggest difference is in memory subsystem, cache, dealing with latency bottleneck due to high clock. However when absolute performace is equal means A13 has to deal with same throughput and latency as Zen2 (low clock advantage is gone).

Please explain your reasons why you think there is difficult achieve high IPC at high clocks. If you look in historical K6 vs. K7 you can see that K6 was 2xALU low clock CPU and more modern K7 was wider 3xALU (much higher IPC) with clocks almost 4x higher (partly from pipeline stage design, partly from better node). IMHO x86 high clocks are due to lowest hanging fruits (its much easier to design simple core at 4 Ghz, than advanced high IPC low power core at 2.6 Ghz).
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,019
136
Bananas and oranges is more like CPU and GPU. I don't see any problem to compare performance between two CPUs on different ISAs. If you are company like Amazon and you run your web/SQL servers on for example Linux&MySQL (you have both binary for ARM and x86) then it's very easy for them to compare performance at REAL load. Very easy. They did it and decided create their own server ARM called Graviton. Do you guys really think that Amazon invested huge amount of money into something incomparable? Do you think that people in Amazon don't see that huge +82% IPC advantage delivered by Apple's ARM CPU? Did Apple switched ISA from PowerPC to x86 because it was incomparable?

BTW Mark Papermaster said that AMD will deliver at least 7% IPC jump each gen. Looks nice but In other words we will wait whole decade until AMD will reach IPC level of today Apple cellphones. And because Apple delivers approx +10-15% IPC every year means that AMD and Intel would need to bring +20% every year to catch Apple's IPC performance in one decade. So if completely new uarch like Zen 3 will bring less than +20% IPC I consider it as fail (in long term fight against ARM). If those leaks about 10-12% IPC are true, that's big fail.

You'd damn well hope that the ARM chip has higher IPC, it's a RISC architecture. It requires more instructions to perform the same amount of work.
 
  • Like
Reactions: Tlh97 and amd6502

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Also we know that Cortex A72 can run at 4 GHz, so those Apples would run around 4 GHz if manufactured with same HP process as x86.
That 4 Ghz example was a test chip, I saw nothing to indicate any advantageous power consumption in the news surrounding it.

It's pointless comparing A72 to the Axx cores, they are completely different uArch designs by different engineering teams.
IMHO x86 high clocks are due to lowest hanging fruits (its much easier to design simple core at 4 Ghz, than advanced high IPC low power core at 2.6 Ghz).
Again, it's about power - it's not "low hanging fruit", it's sheer power consumption.

Zen and Core derivatives perform far more efficiently below 3 Ghz - even 12nm could get you a 45W 8C Zen+ CPU at 2.8 Ghz (2700E).

I'd be interested to see someone do some tests on the 8C Zen2 SKU's underclocked to that 2700E range and see how many watts it pulls on average loads.
If you look in historical K6 vs. K7 you can see that K6 was 2xALU low clock CPU and more modern K7 was wider 3xALU (much higher IPC) with clocks almost 4x higher (partly from pipeline stage design, partly from better node).
Clock frequency was scaled faster than those node improvements allowed for, which is part of the reason we went from passive bulk metal heatsinks with a few thicc fins to active air cooled heat pipe augmented dense packed thin fin heatsinks in just a couple of decades.

What we see as ARM's awesome efficiency triumph is in no small way due to them waiting until much more advanced nodes to really pursue performance once smartphones took off.

Cortex A8 at 1Ghz was at 45nm.

Athlon 1 Ghz was at 250 nm
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
I'd be interested to see someone do some tests on the 8C Zen2 SKU's underclocked to that 2700E range and see how many watts it pulls on average loads.

Depends on what voltage you give it. I managed to get down to around 1 volt @ 2.8 GHz on a 3900x and it pulled maybe 62W package power. Any voltage lower than that and I started noticing performance degradation. This was running CBR20 MT. Of note: SoC power was around 1/3rd the power budget at this point, probably due in large part to my memory overclock. Prolly could have gotten it to run lower without running DDR4-3733.

And um yeah, of course I'm not running an 8c chip. I estimate that you could knock maybe 15W off the power consumption by dropping four cores. Maybe.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Prolly could have gotten it to run lower without running DDR4-3733.
My stock clocked DDR4-2400 sticks whimpered as I read this.

I'm hoping to upgrade straight to DDR5-4800, hopefully it won't impede my performance as much my current RAM on the CPU's of the future.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
Please explain your reasons why you think there is difficult achieve high IPC at high clocks.
Because when CPU performance is increased, the miss penalty becomes more significant. A higher performance means lower CPI, which means a greater proportion of time spent on memory stalls. An increased clock rate means that memory stalls account for more CPU cycles. That's why "IPC" increases if you benchmark a program while lowering the peak clock frequency. If you reverse that process, ie. scale frequency upwards, it is imperative that "IPC" would decrease. A 2.5 GHz A77 might be 25% faster than a 2.5 GHz A76, but compare them both at 4 GHz and suddenly the "IPC" gain is a lot less.
 

mtcn77

Member
Feb 25, 2017
105
22
91
Isn't wider instructions always less efficient due to the wider MUX'es needed to run them natively?
 
Last edited:

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
I don't see any problem to compare performance between two CPUs on different ISAs.
I know you don't, that's why I know better than to waste my time and energy pointlessly arguing with you (about this).
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
What we see as ARM's awesome efficiency triumph is in no small way due to them waiting until much more advanced nodes to really pursue performance once smartphones took off.

I agree, their pricier high performance cores are on the very newest bleeding edge nodes (giving their products a big efficiency gain and a large transistor budget).

Their budget oriented cores (eg A53, A35, etc https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores) are on trailing edge nodes like 28nm, but they were high efficiency in-order designs. These are sort of reused on new nodes as little cores, giving doubled up efficiency.

If x86 wants to compete in ultra efficient processing (eg tablets/2-in-1s or certain server markets) they will need a way to evaluate threads in a near in-order fashion with very little speculation and running ahead.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Please explain your reasons why you think there is difficult achieve high IPC at high clocks.

Because you need every part of the core to be synchronous. The faster the clock frequency, the harder it is to reach same perf/clock, because the same circuits need to be driven harder. Not only that, sometimes you need whole different circuitry that's more complex just to do the same thing at lower frequencies.

Intel's Atom-based Goldmont Plus has a L1 cache latency of 3 cycles. Skylake and Icelake is 5 cycles(Skylake is 4 in some corner case scenarios). Goldmont Plus aims for 3GHz, and SKL/ICL for 4-5GHz.

You see, 5 cycles @ 5GHz is same as 3 cycles @ 3GHz.

Same for pipeline stages. A10 had 12-13 stages, and based on clocks, its likely A13 is also at the same level. The Cortex A77 core also has a 13 cycle pipeline. The lowest branch mispredict stage is 10 cycles when you consider its uop cache.

In contrast, Skylake is 18-20 stages, and best case is 14-15 stages when uop cache is considered.

Back when Intel introduced the Pentium 4, some estimated each additional pipeline stage was responsible for 2-5% difference in performance. It's probably closer to 2% than 5%, but it adds up.
 
Last edited:

Arzachel

Senior member
Apr 7, 2011
903
76
91
Intel's Atom-based Goldmont Plus has a L1 cache latency of 3 cycles. Skylake and Icelake is 5 cycles(Skylake is 4 in some corner case scenarios). Goldmont Plus aims for 3GHz, and SKL/ICL for 4-5GHz.

You see, 5 cycles @ 5GHz is same as 3 cycles @ 3GHz.

Apple's cpus are a great example of this, A12 has double both L1 caches of Skylake X and 6x the L2 and targeting higher clockspeeds would require much worse latencies.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Since, everyone is talking about other stuff might as well...

149 mm2
octo-core CMT
2x32-bit DDR5
4x PCIE 5.0 (GFX)
4x PCIE 5.0? (GPP)
2x SATA or 4x PCIE (STG)
No iGPU
Budget/Essential/Casual Media Socket 1. (AM1 successor)

AMD PIPE PCS with 3rd party PHY for PCIe Gen1/2/3/4/5 <== 5.0
Led feature checkout and enablement for PCIe IP based on PCIe 4.0/5.0 specifications for discrete GPU ASICs with exposure to various PCIe end point and root devices <= 5.0

With that the major thing to pull from this is that Ryzen 4000's X670? might be incoming with PCIe 5.0 w/ RDNA2.
 
  • Like
Reactions: amd6502

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Since, everyone is talking about other stuff might as well...

149 mm2
octo-core CMT
2x32-bit DDR5
4x PCIE 5.0 (GFX)
4x PCIE 5.0? (GPP)
2x SATA or 4x PCIE (STG)
No iGPU
Budget/Essential/Casual Media Socket 1. (AM1 successor)

AMD PIPE PCS with 3rd party PHY for PCIe Gen1/2/3/4/5 <== 5.0
Led feature checkout and enablement for PCIe IP based on PCIe 4.0/5.0 specifications for discrete GPU ASICs with exposure to various PCIe end point and root devices <= 5.0

With that the major thing to pull from this is that Ryzen 4000's X670? might be incoming with PCIe 5.0 w/ RDNA2.
aaaand we're back to fairy tales again...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
PCIe 5 without DDR5? Or are you implying DDR5, too?
DDR4/DDR5 are compatible with each other physically(same pin)/electrically(same pin). So, it is possible that the AM4 X670? will also bring DDR5 DIMMs. With PCIe 5.0 being available w/ the IOD&Mobo refresh.
 
  • Like
Reactions: amd6502

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
DDR4/DDR5 are compatible with each other physically(same pin)/electrically(same pin). So, it is possible that the AM4 X670? will also bring DDR5 DIMMs. With PCIe 5.0 being available w/ the IOD&Mobo refresh.
Right, both are 380 pin and the latter newer contains the vR on the modules as opposed to the mobo. But you run into the issue of the old days when you had mobos supporting DDR2/3. They had limitations as opposed to boards supporting DDR3 or the older modules. You'd still probably need a new pin layout for DDR5, and possibly another one or squeezing in PCIe5. And it's easier to hard launch multiple new feature sets rather than stagger them over multiple generations + a socket change. Another issue is how AMD does releases. 7/7 for 7nm. DDR5 is a go for 2020 with SK Hynix and Samsung promising good stuff at launch as opposed to DDR4's launch. Now assuming a 5/5 launch in 2020 for PCIe5 and DDR5 and USB4, you've only got a lead time of six months until we hear the just before launch of Ryzen 4000. So a 5/5 date doesn't work, unless they utilize the 5/5 date launch for Milan, which makes sense as AMD plans on releasing Milan months ahead of Ryzen 4000. Otherwise you go to 2021 which allows the following:

AM5 Socket
DDR5
USB4
PCIe 5.0
5/5 release

This would give AMD enough time to work with their vendors and deliver a solid product launch. AMD's problem is their launches aren't very good. And often quite buggy. 2-3 years of bad launches vs one large launch that gets cleared up in a few months is a lot easier to swallow than 2-3 years of bad ones that will leave a bad taste in customers' mouths. Anyway, you do bring up a reasonable suggestion as to why. It's a lot easier to take in than some of the wild theories people have come up with.

I'm going to stay out of the 4SMT discussion on the basis I haven't had to read the Hennessy books in over a decade. Anything I learned then is severely outdated let alone been in a fabrication plant.
 
Last edited:

Veradun

Senior member
Jul 29, 2016
564
780
136
Right, both are 380 pin and the latter newer contains the vR on the modules as opposed to the mobo. But you run into the issue of the old days when you had mobos supporting DDR2/3. They had limitations as opposed to boards supporting DDR3 or the older modules. You'd still probably need a new pin layout for DDR5, and possibly another one or squeezing in PCIe5. And it's easier to hard launch multiple new feature sets rather than stagger them over multiple generations + a socket change. Another issue is how AMD does releases. 7/7 for 7nm. DDR5 is a go for 2020 with SK Hynix and Samsung promising good stuff at launch as opposed to DDR4's launch. Now assuming a 5/5 launch in 2020 for PCIe5 and DDR5 and USB4, you've only got a lead time of six months until we hear the just before launch of Ryzen 4000. So a 5/5 date doesn't work, unless they utilize the 5/5 date launch for Milan, which makes sense as AMD plans on releasing Milan months ahead of Ryzen 4000. Otherwise you go to 2021 which allows the following:

AM5 Socket
DDR5
USB4
PCIe 5.0
5/5 release

This would give AMD enough time to work with their vendors and deliver a solid product launch. AMD's problem is their launches aren't very good. And often quite buggy. 2-3 years of bad launches vs one large launch that gets cleared up in a few months is a lot easier to swallow than 2-3 years of bad ones that will leave a bad taste in customers' mouths. Anyway, you do bring up a reasonable suggestion as to why. It's a lot easier to take in than some of the wild theories people have come up with.

I'm going to stay out of the 4SMT discussion on the basis I haven't had to read the Hennessy books in over a decade. Anything I learned then is severely outdated let alone been in a fabrication plant.
I want to casually point out that 5nm, PCIe5, DDR5, AM5 would fit nicely on a 5/5/2021 launch date. Note that 2021->2+0+2+1=5 :D
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
I want to casually point out that 5nm, PCIe5, DDR5, AM5 would fit nicely on a 5/5/2021 launch date. Note that 2021->2+0+2+1=5 :D
Only if Zen7 comes on a seventh day of the seventh month at 7AM!

Edit:

Zen6 could come in 2023 when PCIe 6 might be due!

666 - the Red Devil cometh!
 
  • Like
Reactions: lightmanek and OTG