Speculation: Ryzen 4000 series/Zen 3

Page 60 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
imho, if there is an ALU increase it will be 8 ALUs. Not 6 or 10. This is for the constrained aspect of the potential SMT4. When all threads are full they will only be able to 2 ALUs/2 FPUs at a given # of cycles, with a restricted amount of registers usable for each thread. With that increase the FPU vALUs will also increase from 4 FP256 vALUs to 8 FP128 vALUs. Which is already physically implemented in the Zen2 design. The only thing left is the promised core refactorization.

The SMT4 allows for:
- a reduction of amount of cores used, which benefits single(eighth die)-two(quarter die)-quad core(half die) boosts.
- increased utilization of units at high frequency, better ILP/EPI at 3~5 GHz.

Maybe...
4x28 instead with simpler ALUs having integrated LD/ST AGUs.
4x16 + 1x28 => 92-entry tSQ, 4x28 => 112-entry tSQ
5 SQs(7-units) to 4 SQs(8-units)

FPU side might have two separate FP128 four-issue SQs rather than a single FP256 four-issue SQ. If NSQ dispatches Lo-128(MUL0) or Lo-256(MUL0/MUL1) to SQ0, it also must dispatch Hi-128(MUL2) or Hi-256(MUL2/MUL3) to SQ1. If NSQ dispatches Lo-128(ADD0) or Lo-256(ADD0/ADD1) to SQ0, it also must dispatch Hi-128(ADD2) or Hi-256(ADD2/ADD3) to SQ1. NSQ can continue with 4-issue with lo-half first 4 and hi-half second 4, or they could target eight issue. The same applies to the 8 ALU design, four AGUs then four ALUs. If there is no hi-half then it is all lo-half, or if its all ALU or AGU, then there is no AGU or ALU ops dispatched.

6 units (4+2) => 192 -> 32x6 (Maximum of 6 micro-op)
7 units (4+3) => 224 -> 32x7 (Maximum of 7 micro-op)
8 units(4 Complex + 4 Simple/AGU) => 256? -> 32x8, if AMD wants to be on-par with Sunnycove this will have to be 44x8 => 352 retire. (Maximum of 8 micro-op) <== Refactored with FPU (8 INT or 8 FP)
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
AMD's earliest design response to a 'Cove' core would be Zen5 at best?
Should be the architecture after Zen3.

AMD has two core refactor tokens.
Qualcomm ARM HPC core w/ HTM (Qualcomm HPC core 2011~2016 => AMD)
Intel Folsom x86 HPC core w/ Folsom design (Intel Folsom Big core team 2016~2017 => AMD)
The above cores were pretty fantastical, however do to problems in marketing/management they were killed before they were deployed.

2012 -> Zen ramps, 2015 -> Zen powerout?, 2016 -> Zen tapeout?
2016 -> New core ramps, 2019 -> New core powerout?, 2020 -> New core tapeout?

The intel core portion is derived from => • The team Taped out 10nm Intel process High speed Core design, ~5Ghz Freq. that was used both by Client & Server products.
The qualcomm portion core is derived from => SMT4, HTM, etc team.

There are other transfers but I am set on those being the Next-gen Chip Multithreading/Cluster-based Multithreading architecture. We haven't seen RDNA Mobility yet as a GCN replacement in the sub-10 watt market.

If anyone sees a Sempron 4000GT(or GE) on anything. That is 100% going to be 12FDX, fyi.
 
Last edited:

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
I get the feeling he's referring to tick-tock structure based on his comment of ramping of cores.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
The intel core portion is derived from => • The team Taped out 10nm Intel process High speed Core design, ~5Ghz Freq. that was used both by Client & Server products.
Sorry man, you completely lost me here. Does anyone care to elaborate further? I'm asking this out of genuine curiosity and not mockery.
 
  • Like
Reactions: amd6502

soresu

Diamond Member
Dec 19, 2014
4,114
3,570
136
Qualcomm ARM HPC core w/ HTM (Qualcomm HPC core 2011~2016 => AMD)
At best you are referring to Falkor or its successor Saphira - it's pretty clear they were cancelled due to Qualcomm's analysis of dubious demand for ARM servers at the time.

Though considering W10 on ARM happened not long after, and seemingly aimed at Qulacomm in particular, I would not be surprised to find it was mostly dead rather than completely dead (to use a Princess Bride ism).
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Papermaster hinted that SMT4 was much dependent on loads. Consumer desktop would see little benefit from it. Server and (quite likely) mobile would see benefits. SMT4 or some other 4-way MT would allow for reduced core counts (and thus a more power sipping CPU) in mobile. Whether it's worth it for consumer mobile would depend on how well Chrome and Firefox threads could use such threads.

It's also been stated that 7nm is more challenging with its heat sensitivity, power density and hot spots considerations. If you're tuning maximum frequency parts (damned be the wattage sort of parts), then the fact that you have a whole bunch of idle or sleeping ALUs may not be a bad thing. You will probably get higher boost and be able to remain longer in boost when you have a high percentage of idle piplelines versus say full duty pipelines that you might get running under SMT4.

Suppose Zen3 is marginally wider (5 or 6 ALU) and 3 AGU. The % utilization for full load with SMT2 is still going to be fairly good.

So from a DT perspective it's too early for SMT4 in Zen3.

I still am hopeful though that they would add 4-way MT in Zen3 to squeeze out more perf/watt for the mobile and server markets (and if so, hopefully allow these as non-default options in desktop). Low odds, I know. But in my opinion not so low for Zen4.
 

Adonisds

Member
Oct 27, 2019
98
33
51
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,248
16,108
136
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing
What you forget (or maybe don't know) is that buying servers (at least in the companies I came from) is picked and approved by managers, who quite often don't know anything technical or research it. So for them, the Intel marketing dept is king.

Edit: Example from 2002-2004 ish (not sure exactly) This is when AMD Opterons took half the power and produced way less heat than Intel server chips, but were faster. I asked management about it, when we were going to get a new server and his comment was "we only buy REAL CPUs, Intel".
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing

What Markfw said is true, though I think less so today than 10 - 20 years ago, you also have to keep in mind that there is a lot of upfront time and cost that goes into switching vendors, even when it's with compatible x86 CPUs. For home desktops or diy workstations you can source and build everything yourself and consumer software and Windows tend to be pretty nimble in working on new hardware. Many times this isn't the case for servers.

There's a whole software stack that many times has been fine tuned to a particular vendor or even architecture that has to be re-validated and possibly even re-worked to support the new CPUs. OS support might be lagging or insufficient in the early days of release, your particular software may run a lot slower despite the hardware being faster due to optimizations, or your software stack might not work at all. So if you jump in with purchasing this brand new hardware that all of a sudden has compatibility issues and runs slower (or at least not appreciably faster) than your old hardware, and you spent tens to hundreds of thousands (or even millions) of dollars to switch, you probably won't have a job for too long. In time you could get everything running as expected but that's not going to fly when the company is losing money everyday dealing with all these issues on top of the price just to get the new equipment.

So servers take a while. There's long validation processes that have to happen and there has to be a big enough justification to switch over something that has been proven to work well already. Additionally, there's a lot of companies that flash great enterprise products just to have 1 or maybe 2 generations of them and then disappear. So companies will be hesitant to jump from a proven provider to a new face or, in AMD's case, someone with a rocky history before that provider can prove roadmap and support stability. That's why AMD execs have been emphasizing their roadmap stability so much at tech conferences and in financial calls. With AMD being able now to partner with TSMC who has a clear process roadmap with a good history of success as well as proven architecture advancements year in and year out, you'll start to see their server share ramp up and continue to increase at a more rapid pace if they can continue to deliver promised results within the promised timeline.

That's also why 7 nm is so crucial for intel in a couple of years. Every provider has some missteps that companies will overlook from time to time, but now that AMD is a real competitor again, if intel botches 7 nm launch, it's not going to be good for them in the enterprise world as all of a sudden they'll be seen as less of a stable provider now with a pattern of roadmap failures. Not saying it will be their end, but it will definitely put them on a slippery slope down hill that they'll have to really fight hard to get off at that point. Consumer side they'll be fine for a bit longer with those OEMs as AMD won't be able to have intel levels of supply there anyway, but in servers they could take a real beating.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
4,114
3,570
136
That's also why 7 nm is so crucial for intel in a couple of years. Every provider has some missteps that companies will overlook from time to time, but now that AMD is a real competitor again, if intel botches 7 nm launch, it's not going to be good for them in the enterprise world as all of a sudden they'll be seen as less of a stable provider now with a pattern of roadmap failures. Not saying it will be their end, but it will definitely put them on a slippery slope down hill that they'll have to really fight hard to get off at that point. Consumer side they'll be fine for a bit longer with those OEMs as AMD won't be able to have intel levels of supply there anyway, but in servers they could take a real beating.
At best I see Intel catching TSMC.

TSMC will have N5P in 2021 which should be competitive with Intels 7nm, and they are already building for 3nm despite being strangely silent on its device architecture (finFET or GAA Nanosheet MBCFET).
 

Thunder 57

Diamond Member
Aug 19, 2007
4,035
6,748
136
His posts are interesting to read, let's say that. But not much more.

LoL, I wish you would say more.

Papermaster hinted that SMT4 was much dependent on loads. Consumer desktop would see little benefit from it. Server and (quite likely) mobile would see benefits. SMT4 or some other 4-way MT would allow for reduced core counts (and thus a more power sipping CPU) in mobile. Whether it's worth it for consumer mobile would depend on how well Chrome and Firefox threads could use such threads.

It's also been stated that 7nm is more challenging with its heat sensitivity, power density and hot spots considerations. If you're tuning maximum frequency parts (damned be the wattage sort of parts), then the fact that you have a whole bunch of idle or sleeping ALUs may not be a bad thing. You will probably get higher boost and be able to remain longer in boost when you have a high percentage of idle piplelines versus say full duty pipelines that you might get running under SMT4.

Suppose Zen3 is marginally wider (5 or 6 ALU) and 3 AGU. The % utilization for full load with SMT2 is still going to be fairly good.

So from a DT perspective it's too early for SMT4 in Zen3.

I still am hopeful though that they would add 4-way MT in Zen3 to squeeze out more perf/watt for the mobile and server markets (and if so, hopefully allow these as non-default options in desktop). Low odds, I know. But in my opinion not so low for Zen4.

You, Nosta, and Richie Rich should form a fan fiction club. I've never seen people so blindly ignore evidence.
 

Richie Rich

Senior member
Jul 28, 2019
470
230
76
Who cares about gaming revenue? I thought we were talking about servers. All that gaming revenue is little addictive games that people play on their phones anyway.
You didn't get the point again. Let me explain it for you please. It's about money. AMD is designing server CPUs due to high margins in this server market (and there are big money). When I mentioned gaming revenue it's about money again - ARM platform gains huge money which can be invested into server expansion (this already happens with ARM Neoverse N1, Thunder X2, Graviton2, Ampere eMag2 80c, Nuvia). It's even worse for x86 because it's low margin, high competitive environment... x86 has grown up from this tough roots 30 years ago, competing with giants like Motorola 68000, DEC, IBM Power (those dinosaurs are Intel and AMD now). Just look how bad IPC all x86 CPUs have in compare to ARM. x86 is 5 years behind Apple in development and gap is still getting bigger. Even slow Cortex A77 has better IPC now. Upcoming A14 will take most likely the absolute performance crown. This doesn't look good for x86.

If I made that slide? I would say 4 threads/core or leave it blank. Those are the only options that make sense. Have you ever seen a company lie on a slide to keep an "awesome feature" secret? Especially when you would want software to be ready to take advantage of that awesome feature? Intel will likely have a good idea of what Zen 3 will look like before we do anyway.
So you would disclose major next gen feature at presentation of current product. Great idea. This never happened in history of Intel neither AMD. That's pure naivety. Leaving blank is definitely acceptable however it would feed speculations. AMD doesn't lie if they say Zen 3 will support SMT2. Simply because SMT4 CPU supports SMT2 mode too. On the Zen3 uarch revelation AMD could just say: "Wait, and one more thing, Zen3 also supports SMT4". That's why the presentation says nothing about SMT4 for Zen 3.


BTW: Look how x86 has terribly low IPC in compare to ARM. The fastest Ryzen 3950X @ 4.6 Ghz is beaten by iPhone CPU A13 at just 2.6 GHz. AMD needs to push hard in servers and SMT4 is the lowest hanging fruits there.

IPC calculations of SPECint2006:
  • - 9900K .... 54.28/5 GHz = 10.86 pts/GHz
  • - 3950X .... 50.02/4.6 GH = 10.87 pts/GHz
  • - A76 ........ 26.65/2.84 GHz = 9.38 pts/GHz
  • - A77 ........ 33.32/2.84 GHz = 11.73 pts/GHz ...... +8% IPC over 9900K
  • - A11 ........ 36.80/2.39 GHz = 15.40 pts/GHz .... +42% IPC over 9900K
  • - A12 ........ 45.32/2.53 GHz = 17.91 pts/GHz .... +65% IPC over 9900K
  • - A13 ........ 52.82/2.65 GHz = 19.93 pts/GHz .... +83% IPC over 9900K
  • - A14 ........ 66.00/3.00 GHz = 22.00 pts/GHz (estimated +10%IPC)
  • - Zen3 ...... 60.00/4.60 GHz = 13.04 pts/GHz (estimated +20% IPC)
SPEC2006_S865.png
 

Thunder 57

Diamond Member
Aug 19, 2007
4,035
6,748
136
So you would disclose major next gen feature at presentation of current product. Great idea


I'll have to reply later when I can read it all. This one line though, you are a fool. I work for AMD. I'm not going to disclose chiplets, or SMT2.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
If I made that slide? I would say 4 threads/core or leave it blank. Those are the only options that make sense. Have you ever seen a company lie on a slide to keep an "awesome feature" secret? Especially when you would want software to be ready to take advantage of that awesome feature? Intel will likely have a good idea of what Zen 3 will look like before we do anyway.
So you would disclose major next gen feature at presentation of current product. Great idea. This never happened in history of Intel neither AMD. That's pure naivety. Leaving blank is definitely acceptable however it would feed speculations. AMD doesn't lie if they say Zen 3 will support SMT2. Simply because SMT4 CPU supports SMT2 mode too. On the Zen3 uarch revelation AMD could just say: "Wait, and one more thing, Zen3 also supports SMT4". That's why the presentation says nothing about SMT4 for Zen 3.

You maybe would have a point here except if you go back and look at the presentation, did the slide say that Zen3 will support SMT2 or did it say that Zen3 will have a max core/thread count of 64 cores / 128 threads?
 

soresu

Diamond Member
Dec 19, 2014
4,114
3,570
136
AMD needs to push hard in servers and SMT4 is the lowest hanging fruits there.
Moar cores is always the lowest hanging fruit from an MT perspective, which is all SMT is good for.

Assuming perfect scaling, a doubling of cores gets you 2x MT performance.

I realise it is next to impossible realistically from a SW engineering POV to achieve that perfection, but many applications like raytracing and video encoding can still benefit from it even without better per instance scaling, by increasing number of instances or VM's (or segments/chunks in video encoding).

At the moment they do not need to push anything, they have the most performant, highest thread/core count, and most power efficient server CPU package out there.

As long as a server/render farm/datacenters upgrade cycle falls between now and Intel's real response ti AMD, then AMD will get that contract.

The Frontier supercomputer is a ringing endorsement of their new uArch and competitive position already.

The only thing they currently are tripped up by is per mainboard socket density - Intel can do 4 while AMD's platform only does 2.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
You, Nosta, and Richie Rich should form a fan fiction club. I've never seen people so blindly ignore evidence.

Well, I enjoy reading Nosta and Rich's comments. Zen4 is still in design afaik. I don't think it's too much of a stretch that Zen4 has some form of 4-way MT.

For Zen3 to have it was much more of a hope than an expectation.

And i did mostly agree with you folks. Namely, that underutilization of pipes @ SMT2 isn't going to be a problem at least for desktop. So most likely the same power efficiency recipe is going to be applied as was quite successfully done during the Steamroller → XV upgrade, and also in Zen1, namely heavy gating and powering down of unused circuits, and lots of sensors to allow minimized voltage. Maybe also work on IF and SoC uncore savings. Those things combined with the substantial savings from 7nm should get them pretty far.

I think they will have a killer mobile APU and I hope they skip straight to Zen3 for their monolithic mobile focused APU.

Between Zen2 and Zen3 (or just Zen3 if Zen2 skips monolithic) I think they have enough budget to have two APU dies; small and large. It would make sense for small die to arrive first and wait a bit on 7nm to mature. Small die being quadcore with ~8CU (possibly as low as 6CU) and with the big die being octacore with ~16CU.

A big core 8c APU is going to draw some considerable wattage, even on 7nm. But they can bin these so that most mobile end up being 6c while purposing the bulk of 8c's to desktop.

Also, the software side for minimizing wattage while unplugged should be simple as most real OS's support hotplug CPU. It's a one liner in linux to power down a core.

echo 0 > /sys/devices/system/cpu/cpu3/online


Moar cores is always the lowest hanging fruit from an MT perspective, which is all SMT is good for.

Assuming perfect scaling, a doubling of cores gets you 2x MT performance.

True, but wattage also scales linearly (or affinely, counting uncore wattage) with cores.

So if the uncore and minimally active iGPU account for 1/3 the wattage, ~5W (on a 15W mobile quadcore APU), then the octacore equivalent is going to want to be a 25W tdp part.

I do think/hope that they will have a top bin 8c with 15W tdp and that they can also get good near idle wattage with some software tweaks.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
I think they will have a killer mobile APU and I hope they skip straight to Zen3 for their monolithic mobile focused APU.

They're not. In fact, there's a high possibility of there being 2 Zen 2 based APUs instead, one Vega based, the other RDNA2 based. The latter won't be seen for a while though.
 
  • Like
Reactions: amd6502

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
They're not. In fact, there's a high possibility of there being 2 Zen 2 based APUs instead, one Vega based, the other RDNA2 based. The latter won't be seen for a while though.
With the annual cadence already in place and APUs always coming last, why would AMD do two distinct APU designs in one cycle instead leaving that for the next round of updates? In one more year we'll see an APU based on Zen 3 + RDNA anyway.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146
With the annual cadence already in place and APUs always coming last, why would AMD do two distinct APU designs in one cycle instead leaving that for the next round of updates? In one more year we'll see an APU based on Zen 3 + RDNA anyway.

I don't know, but the driver code don't lie. Van Gogh is heavily based off of Renoir, but is gfx1030.

Apparently Komachi did a write up on Reddit: https://www.reddit.com/user/Komachi_ENSAKA/comments/ea7rm3/van_gogh_is_first_rdna_gfx10_apu/
 

soresu

Diamond Member
Dec 19, 2014
4,114
3,570
136
With the annual cadence already in place and APUs always coming last, why would AMD do two distinct APU designs in one cycle instead leaving that for the next round of updates? In one more year we'll see an APU based on Zen 3 + RDNA anyway.
Renoir and Dali were supposed to come in the same year on that old roadmap with the monster truck on it.