Speculation: Ryzen 4000 series/Zen 3

Page 60 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
I've said it before and I'll say it again - SMT4 is nothing special.

POWER7 had it in 2010 and SPARC T3 had SMT8 in the same year.

If it was such an obvious 'low hanging fruit' gain, then everybody would be doing it rather than the now more obscure POWER and SPARC options.

If it was obvious then you can bet Intel who introduced x86 SMT would have done it by now - they've had more than enough opportunity while they were maintaining a more competitive cadence before 10nm woes set in.

You fail to acknowledge that engineering a core for maximum thread count could compromise its single thread performance - a direction AMD has tested before and barely lived to regret with Bulldozer before Zen.

At this point their concentration is on ST performance and core counts - those 2 things alone provide a steady improvement to MT performance per generation.
To make SMT4 and SMT8 actually work took a *LOT* of transistors. As I recall (And someone will likely correct me, hopefully kindly if I'm wrong) it was another %35 per core. Far above the budget for Intel and AMD. When SMT4 went live as it were it made sense. And probably still does for IBM's market. Having a gazillion threads makes no sense if you can't feed them.

SMT2? Possible, unlikely but possible. SMT4? Possible, but then again I might win the lottery 2 times. This week... As process nodes have shrunk, it may be better on the server side of things to just add more cores.
 
  • Like
Reactions: soresu

Thunder 57

Platinum Member
Aug 19, 2007
2,673
3,791
136
Did you noticed Zen1CPU chip had a ton of server features (ECC, interchip links etc.) on board which was disabled in desktop? The same for K10 Agena, Barcelona, Thuban and K8 too. Weak up, AMD is designing server CPUs as a priority for long time. Desktop was, is and will be just a derivative.


Yes, that's speculation, however people like you does underestimate ARM too much. AMD engineers don't. What will be the first 5nm CPU at TSMC? Apple A14 and Kirin 1000 (based on Cortex), both ARMs, both larger die size that AMDs chiplets. 55% of gaming revenues are from ARM platform and still increasing. ARM has 6xALU in Apple's A12 SoC with highest INT IPC, ARM will have SVE2 instruction set with 128-2018 bit variable vector width. That's scary stuff already. AMD needs to push in servers at maximum. SMT4 is low hanging fruits in servers, higher performance gains than transistor increase. That's why I expect SMT4 for Zen 3 already.


TLB bug was reality for K10, it can happen again to any manufacturer. There is a ton of bugs in every CPU, some of it limits features, some of it limits clocks, just look in errata, that's nothing new.

You try to avoid my question about "naivety". That question was: What would you put into that presentation if you know that Zen 3 is SMT4? Please just answer the question. My thinking is: Empty space would lead to speculations, SMT4 can't use for to keep it hidden, IMHO the only option is to use SMT2 because it is supported as well. I'm looking forward to see what would be your choice.

Who cares about gaming revenue? I thought we were talking about servers. All that gaming revenue is little addictive games that people play on their phones anyway.

The number of ALU's means jack all by itself. You have to be able to make use of them. If it were that simple why not go straight for 10 ALU's?

SVE2 looks interesting, but let's see how it turns out before we say it's scary stuff.

SMT4 doesn't make sense. It would be far better to use those transistors to beef up AVX. If I made that slide? I would say 4 threads/core or leave it blank. Those are the only options that make sense. Have you ever seen a company lie on a slide to keep an "awesome feature" secret? Especially when you would want software to be ready to take advantage of that awesome feature? Intel will likely have a good idea of what Zen 3 will look like before we do anyway.

So, either:

A) AMD is full of dummies because they won't increases ALU's or add SMT4

B) AMD is preparing those features but hasn't implemented them because they want that "reserve performance" for when Intel starts to catch up

or C) AMD build a well balanced, powerful CPU that has radically changed the market and saved the company.

I'm going with C. One more point on SMT4. How great was having a bunch of weak threads for BD in the server world? Or anywhere? Not great. Why would AMD add more threads that have to contend with the same resources? Before you say "But 6 ALU's!", there justi sn't the die space. Assuming AMD keeps AM4, a solid guess, and they need to fit two chiplets on there so they can offer 16 cores, they don't have much room to play with. The extra density from 7nm+ will be spent elsewhere. Branch prediction, larger L2, AVX, making it wider, etc.
 
  • Like
Reactions: soresu

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
imho, if there is an ALU increase it will be 8 ALUs. Not 6 or 10. This is for the constrained aspect of the potential SMT4. When all threads are full they will only be able to 2 ALUs/2 FPUs at a given # of cycles, with a restricted amount of registers usable for each thread. With that increase the FPU vALUs will also increase from 4 FP256 vALUs to 8 FP128 vALUs. Which is already physically implemented in the Zen2 design. The only thing left is the promised core refactorization.

The SMT4 allows for:
- a reduction of amount of cores used, which benefits single(eighth die)-two(quarter die)-quad core(half die) boosts.
- increased utilization of units at high frequency, better ILP/EPI at 3~5 GHz.

Maybe...
4x28 instead with simpler ALUs having integrated LD/ST AGUs.
4x16 + 1x28 => 92-entry tSQ, 4x28 => 112-entry tSQ
5 SQs(7-units) to 4 SQs(8-units)

FPU side might have two separate FP128 four-issue SQs rather than a single FP256 four-issue SQ. If NSQ dispatches Lo-128(MUL0) or Lo-256(MUL0/MUL1) to SQ0, it also must dispatch Hi-128(MUL2) or Hi-256(MUL2/MUL3) to SQ1. If NSQ dispatches Lo-128(ADD0) or Lo-256(ADD0/ADD1) to SQ0, it also must dispatch Hi-128(ADD2) or Hi-256(ADD2/ADD3) to SQ1. NSQ can continue with 4-issue with lo-half first 4 and hi-half second 4, or they could target eight issue. The same applies to the 8 ALU design, four AGUs then four ALUs. If there is no hi-half then it is all lo-half, or if its all ALU or AGU, then there is no AGU or ALU ops dispatched.

6 units (4+2) => 192 -> 32x6 (Maximum of 6 micro-op)
7 units (4+3) => 224 -> 32x7 (Maximum of 7 micro-op)
8 units(4 Complex + 4 Simple/AGU) => 256? -> 32x8, if AMD wants to be on-par with Sunnycove this will have to be 44x8 => 352 retire. (Maximum of 8 micro-op) <== Refactored with FPU (8 INT or 8 FP)
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
AMD's earliest design response to a 'Cove' core would be Zen5 at best?
Should be the architecture after Zen3.

AMD has two core refactor tokens.
Qualcomm ARM HPC core w/ HTM (Qualcomm HPC core 2011~2016 => AMD)
Intel Folsom x86 HPC core w/ Folsom design (Intel Folsom Big core team 2016~2017 => AMD)
The above cores were pretty fantastical, however do to problems in marketing/management they were killed before they were deployed.

2012 -> Zen ramps, 2015 -> Zen powerout?, 2016 -> Zen tapeout?
2016 -> New core ramps, 2019 -> New core powerout?, 2020 -> New core tapeout?

The intel core portion is derived from => • The team Taped out 10nm Intel process High speed Core design, ~5Ghz Freq. that was used both by Client & Server products.
The qualcomm portion core is derived from => SMT4, HTM, etc team.

There are other transfers but I am set on those being the Next-gen Chip Multithreading/Cluster-based Multithreading architecture. We haven't seen RDNA Mobility yet as a GCN replacement in the sub-10 watt market.

If anyone sees a Sempron 4000GT(or GE) on anything. That is 100% going to be 12FDX, fyi.
 
Last edited:

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I get the feeling he's referring to tick-tock structure based on his comment of ramping of cores.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
The intel core portion is derived from => • The team Taped out 10nm Intel process High speed Core design, ~5Ghz Freq. that was used both by Client & Server products.
Sorry man, you completely lost me here. Does anyone care to elaborate further? I'm asking this out of genuine curiosity and not mockery.
 
  • Like
Reactions: amd6502

soresu

Platinum Member
Dec 19, 2014
2,650
1,854
136
Qualcomm ARM HPC core w/ HTM (Qualcomm HPC core 2011~2016 => AMD)
At best you are referring to Falkor or its successor Saphira - it's pretty clear they were cancelled due to Qualcomm's analysis of dubious demand for ARM servers at the time.

Though considering W10 on ARM happened not long after, and seemingly aimed at Qulacomm in particular, I would not be surprised to find it was mostly dead rather than completely dead (to use a Princess Bride ism).
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Papermaster hinted that SMT4 was much dependent on loads. Consumer desktop would see little benefit from it. Server and (quite likely) mobile would see benefits. SMT4 or some other 4-way MT would allow for reduced core counts (and thus a more power sipping CPU) in mobile. Whether it's worth it for consumer mobile would depend on how well Chrome and Firefox threads could use such threads.

It's also been stated that 7nm is more challenging with its heat sensitivity, power density and hot spots considerations. If you're tuning maximum frequency parts (damned be the wattage sort of parts), then the fact that you have a whole bunch of idle or sleeping ALUs may not be a bad thing. You will probably get higher boost and be able to remain longer in boost when you have a high percentage of idle piplelines versus say full duty pipelines that you might get running under SMT4.

Suppose Zen3 is marginally wider (5 or 6 ALU) and 3 AGU. The % utilization for full load with SMT2 is still going to be fairly good.

So from a DT perspective it's too early for SMT4 in Zen3.

I still am hopeful though that they would add 4-way MT in Zen3 to squeeze out more perf/watt for the mobile and server markets (and if so, hopefully allow these as non-default options in desktop). Low odds, I know. But in my opinion not so low for Zen4.
 

Adonisds

Member
Oct 27, 2019
98
33
51
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,540
14,495
136
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing
What you forget (or maybe don't know) is that buying servers (at least in the companies I came from) is picked and approved by managers, who quite often don't know anything technical or research it. So for them, the Intel marketing dept is king.

Edit: Example from 2002-2004 ish (not sure exactly) This is when AMD Opterons took half the power and produced way less heat than Intel server chips, but were faster. I asked management about it, when we were going to get a new server and his comment was "we only buy REAL CPUs, Intel".
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,236
7,785
136
AMD CPUs are getting better than Intel's. AMD has been gaining market share on the desktop quickly, but projections show that on the server market the market share shift is going to be very slow. Why is that?

If I didn't know anything I would have guessed that market share would shift quicker on the servers, because people who buy servers should know more about CPUs than desktop consumers, and should be less susceptible to marketing

What Markfw said is true, though I think less so today than 10 - 20 years ago, you also have to keep in mind that there is a lot of upfront time and cost that goes into switching vendors, even when it's with compatible x86 CPUs. For home desktops or diy workstations you can source and build everything yourself and consumer software and Windows tend to be pretty nimble in working on new hardware. Many times this isn't the case for servers.

There's a whole software stack that many times has been fine tuned to a particular vendor or even architecture that has to be re-validated and possibly even re-worked to support the new CPUs. OS support might be lagging or insufficient in the early days of release, your particular software may run a lot slower despite the hardware being faster due to optimizations, or your software stack might not work at all. So if you jump in with purchasing this brand new hardware that all of a sudden has compatibility issues and runs slower (or at least not appreciably faster) than your old hardware, and you spent tens to hundreds of thousands (or even millions) of dollars to switch, you probably won't have a job for too long. In time you could get everything running as expected but that's not going to fly when the company is losing money everyday dealing with all these issues on top of the price just to get the new equipment.

So servers take a while. There's long validation processes that have to happen and there has to be a big enough justification to switch over something that has been proven to work well already. Additionally, there's a lot of companies that flash great enterprise products just to have 1 or maybe 2 generations of them and then disappear. So companies will be hesitant to jump from a proven provider to a new face or, in AMD's case, someone with a rocky history before that provider can prove roadmap and support stability. That's why AMD execs have been emphasizing their roadmap stability so much at tech conferences and in financial calls. With AMD being able now to partner with TSMC who has a clear process roadmap with a good history of success as well as proven architecture advancements year in and year out, you'll start to see their server share ramp up and continue to increase at a more rapid pace if they can continue to deliver promised results within the promised timeline.

That's also why 7 nm is so crucial for intel in a couple of years. Every provider has some missteps that companies will overlook from time to time, but now that AMD is a real competitor again, if intel botches 7 nm launch, it's not going to be good for them in the enterprise world as all of a sudden they'll be seen as less of a stable provider now with a pattern of roadmap failures. Not saying it will be their end, but it will definitely put them on a slippery slope down hill that they'll have to really fight hard to get off at that point. Consumer side they'll be fine for a bit longer with those OEMs as AMD won't be able to have intel levels of supply there anyway, but in servers they could take a real beating.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,650
1,854
136
That's also why 7 nm is so crucial for intel in a couple of years. Every provider has some missteps that companies will overlook from time to time, but now that AMD is a real competitor again, if intel botches 7 nm launch, it's not going to be good for them in the enterprise world as all of a sudden they'll be seen as less of a stable provider now with a pattern of roadmap failures. Not saying it will be their end, but it will definitely put them on a slippery slope down hill that they'll have to really fight hard to get off at that point. Consumer side they'll be fine for a bit longer with those OEMs as AMD won't be able to have intel levels of supply there anyway, but in servers they could take a real beating.
At best I see Intel catching TSMC.

TSMC will have N5P in 2021 which should be competitive with Intels 7nm, and they are already building for 3nm despite being strangely silent on its device architecture (finFET or GAA Nanosheet MBCFET).
 

Thunder 57

Platinum Member
Aug 19, 2007
2,673
3,791
136
His posts are interesting to read, let's say that. But not much more.

LoL, I wish you would say more.

Papermaster hinted that SMT4 was much dependent on loads. Consumer desktop would see little benefit from it. Server and (quite likely) mobile would see benefits. SMT4 or some other 4-way MT would allow for reduced core counts (and thus a more power sipping CPU) in mobile. Whether it's worth it for consumer mobile would depend on how well Chrome and Firefox threads could use such threads.

It's also been stated that 7nm is more challenging with its heat sensitivity, power density and hot spots considerations. If you're tuning maximum frequency parts (damned be the wattage sort of parts), then the fact that you have a whole bunch of idle or sleeping ALUs may not be a bad thing. You will probably get higher boost and be able to remain longer in boost when you have a high percentage of idle piplelines versus say full duty pipelines that you might get running under SMT4.

Suppose Zen3 is marginally wider (5 or 6 ALU) and 3 AGU. The % utilization for full load with SMT2 is still going to be fairly good.

So from a DT perspective it's too early for SMT4 in Zen3.

I still am hopeful though that they would add 4-way MT in Zen3 to squeeze out more perf/watt for the mobile and server markets (and if so, hopefully allow these as non-default options in desktop). Low odds, I know. But in my opinion not so low for Zen4.

You, Nosta, and Richie Rich should form a fan fiction club. I've never seen people so blindly ignore evidence.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Who cares about gaming revenue? I thought we were talking about servers. All that gaming revenue is little addictive games that people play on their phones anyway.
You didn't get the point again. Let me explain it for you please. It's about money. AMD is designing server CPUs due to high margins in this server market (and there are big money). When I mentioned gaming revenue it's about money again - ARM platform gains huge money which can be invested into server expansion (this already happens with ARM Neoverse N1, Thunder X2, Graviton2, Ampere eMag2 80c, Nuvia). It's even worse for x86 because it's low margin, high competitive environment... x86 has grown up from this tough roots 30 years ago, competing with giants like Motorola 68000, DEC, IBM Power (those dinosaurs are Intel and AMD now). Just look how bad IPC all x86 CPUs have in compare to ARM. x86 is 5 years behind Apple in development and gap is still getting bigger. Even slow Cortex A77 has better IPC now. Upcoming A14 will take most likely the absolute performance crown. This doesn't look good for x86.

If I made that slide? I would say 4 threads/core or leave it blank. Those are the only options that make sense. Have you ever seen a company lie on a slide to keep an "awesome feature" secret? Especially when you would want software to be ready to take advantage of that awesome feature? Intel will likely have a good idea of what Zen 3 will look like before we do anyway.
So you would disclose major next gen feature at presentation of current product. Great idea. This never happened in history of Intel neither AMD. That's pure naivety. Leaving blank is definitely acceptable however it would feed speculations. AMD doesn't lie if they say Zen 3 will support SMT2. Simply because SMT4 CPU supports SMT2 mode too. On the Zen3 uarch revelation AMD could just say: "Wait, and one more thing, Zen3 also supports SMT4". That's why the presentation says nothing about SMT4 for Zen 3.


BTW: Look how x86 has terribly low IPC in compare to ARM. The fastest Ryzen 3950X @ 4.6 Ghz is beaten by iPhone CPU A13 at just 2.6 GHz. AMD needs to push hard in servers and SMT4 is the lowest hanging fruits there.

IPC calculations of SPECint2006:
  • - 9900K .... 54.28/5 GHz = 10.86 pts/GHz
  • - 3950X .... 50.02/4.6 GH = 10.87 pts/GHz
  • - A76 ........ 26.65/2.84 GHz = 9.38 pts/GHz
  • - A77 ........ 33.32/2.84 GHz = 11.73 pts/GHz ...... +8% IPC over 9900K
  • - A11 ........ 36.80/2.39 GHz = 15.40 pts/GHz .... +42% IPC over 9900K
  • - A12 ........ 45.32/2.53 GHz = 17.91 pts/GHz .... +65% IPC over 9900K
  • - A13 ........ 52.82/2.65 GHz = 19.93 pts/GHz .... +83% IPC over 9900K
  • - A14 ........ 66.00/3.00 GHz = 22.00 pts/GHz (estimated +10%IPC)
  • - Zen3 ...... 60.00/4.60 GHz = 13.04 pts/GHz (estimated +20% IPC)
SPEC2006_S865.png
 

Thunder 57

Platinum Member
Aug 19, 2007
2,673
3,791
136
So you would disclose major next gen feature at presentation of current product. Great idea


I'll have to reply later when I can read it all. This one line though, you are a fool. I work for AMD. I'm not going to disclose chiplets, or SMT2.
 

Hitman928

Diamond Member
Apr 15, 2012
5,236
7,785
136
If I made that slide? I would say 4 threads/core or leave it blank. Those are the only options that make sense. Have you ever seen a company lie on a slide to keep an "awesome feature" secret? Especially when you would want software to be ready to take advantage of that awesome feature? Intel will likely have a good idea of what Zen 3 will look like before we do anyway.
So you would disclose major next gen feature at presentation of current product. Great idea. This never happened in history of Intel neither AMD. That's pure naivety. Leaving blank is definitely acceptable however it would feed speculations. AMD doesn't lie if they say Zen 3 will support SMT2. Simply because SMT4 CPU supports SMT2 mode too. On the Zen3 uarch revelation AMD could just say: "Wait, and one more thing, Zen3 also supports SMT4". That's why the presentation says nothing about SMT4 for Zen 3.

You maybe would have a point here except if you go back and look at the presentation, did the slide say that Zen3 will support SMT2 or did it say that Zen3 will have a max core/thread count of 64 cores / 128 threads?
 

soresu

Platinum Member
Dec 19, 2014
2,650
1,854
136
AMD needs to push hard in servers and SMT4 is the lowest hanging fruits there.
Moar cores is always the lowest hanging fruit from an MT perspective, which is all SMT is good for.

Assuming perfect scaling, a doubling of cores gets you 2x MT performance.

I realise it is next to impossible realistically from a SW engineering POV to achieve that perfection, but many applications like raytracing and video encoding can still benefit from it even without better per instance scaling, by increasing number of instances or VM's (or segments/chunks in video encoding).

At the moment they do not need to push anything, they have the most performant, highest thread/core count, and most power efficient server CPU package out there.

As long as a server/render farm/datacenters upgrade cycle falls between now and Intel's real response ti AMD, then AMD will get that contract.

The Frontier supercomputer is a ringing endorsement of their new uArch and competitive position already.

The only thing they currently are tripped up by is per mainboard socket density - Intel can do 4 while AMD's platform only does 2.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
You, Nosta, and Richie Rich should form a fan fiction club. I've never seen people so blindly ignore evidence.

Well, I enjoy reading Nosta and Rich's comments. Zen4 is still in design afaik. I don't think it's too much of a stretch that Zen4 has some form of 4-way MT.

For Zen3 to have it was much more of a hope than an expectation.

And i did mostly agree with you folks. Namely, that underutilization of pipes @ SMT2 isn't going to be a problem at least for desktop. So most likely the same power efficiency recipe is going to be applied as was quite successfully done during the Steamroller → XV upgrade, and also in Zen1, namely heavy gating and powering down of unused circuits, and lots of sensors to allow minimized voltage. Maybe also work on IF and SoC uncore savings. Those things combined with the substantial savings from 7nm should get them pretty far.

I think they will have a killer mobile APU and I hope they skip straight to Zen3 for their monolithic mobile focused APU.

Between Zen2 and Zen3 (or just Zen3 if Zen2 skips monolithic) I think they have enough budget to have two APU dies; small and large. It would make sense for small die to arrive first and wait a bit on 7nm to mature. Small die being quadcore with ~8CU (possibly as low as 6CU) and with the big die being octacore with ~16CU.

A big core 8c APU is going to draw some considerable wattage, even on 7nm. But they can bin these so that most mobile end up being 6c while purposing the bulk of 8c's to desktop.

Also, the software side for minimizing wattage while unplugged should be simple as most real OS's support hotplug CPU. It's a one liner in linux to power down a core.

echo 0 > /sys/devices/system/cpu/cpu3/online


Moar cores is always the lowest hanging fruit from an MT perspective, which is all SMT is good for.

Assuming perfect scaling, a doubling of cores gets you 2x MT performance.

True, but wattage also scales linearly (or affinely, counting uncore wattage) with cores.

So if the uncore and minimally active iGPU account for 1/3 the wattage, ~5W (on a 15W mobile quadcore APU), then the octacore equivalent is going to want to be a 25W tdp part.

I do think/hope that they will have a top bin 8c with 15W tdp and that they can also get good near idle wattage with some software tweaks.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,615
5,861
146
I think they will have a killer mobile APU and I hope they skip straight to Zen3 for their monolithic mobile focused APU.

They're not. In fact, there's a high possibility of there being 2 Zen 2 based APUs instead, one Vega based, the other RDNA2 based. The latter won't be seen for a while though.
 
  • Like
Reactions: amd6502

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
They're not. In fact, there's a high possibility of there being 2 Zen 2 based APUs instead, one Vega based, the other RDNA2 based. The latter won't be seen for a while though.
With the annual cadence already in place and APUs always coming last, why would AMD do two distinct APU designs in one cycle instead leaving that for the next round of updates? In one more year we'll see an APU based on Zen 3 + RDNA anyway.