Speculation: Ryzen 4000 series/Zen 3

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
All this talk about Jim Keller... was really THAT important on Zen and future iterations? Wasn't his principal role just to manage the project while others did the real work?
Jim Keller headed the team that both worked on Zen and the ARM K12 at once. He's credited for building the framework, e.g. the SCF part of the IF that includes thousands of sensors on every die, allowing the team to speed up development on the chip design by debugging and tweaking the design at any point, also making the scalability as well as decoupling into different dies possible. (It's that what Intel hired Keller for, for his SoC expertise.) The Zen cores design itself is a direct follow up on the Construction cores (with essentially the same staff), though Keller may have been involved in planing the overall feature roadmap up to Zen 3.
 
  • Like
Reactions: Drazick

Panino Manino

Golden Member
Jan 28, 2017
1,143
1,383
136
Jim Keller headed the team that both worked on Zen and the ARM K12 at once. He's credited for building the framework, e.g. the SCF part of the IF that includes thousands of sensors on every die, allowing the team to speed up development on the chip design by debugging and tweaking the design at any point, also making the scalability as well as decoupling into different dies possible. (It's that what Intel hired Keller for, for his SoC expertise.) The Zen cores design itself is a direct follow up on the Construction cores (with essentially the same staff), though Keller may have been involved in planing the overall feature roadmap up to Zen 3.

Yes, that's why I believe there's no "Keller Sauce" coming, and we need to talk more about and give more credit for the people most responsible for the actual Zen design.
 
  • Like
Reactions: scannall

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
Yes, that's why I believe there's no "Keller Sauce" coming, and we need to talk more about and give more credit for the people most responsible for the actual Zen design.
I agree Keller definitely gets too much of the credits. I think it's pretty clear that the overall plans for the Zen roadmap was to steadily build up in all areas. And the Zen cores are only one side of the coin, the other side is all the uncore and packaging. Zen 2 had big changes both in the core as well as in the uncore and packaging. And since Zen 1 through 3 were planned from the start the changes in Zen 3 may well be bigger than some are expecting. Regarding crediting, Mike Clark is officially credited as Zen Chief Architect and has been working on the cores from the start, but should he be credited for the uncore and packaging as well? He appears to fill the overall team leader position that Keller held until he left in 2015. What other people credited for Zen but not necessarily the cores do we know?
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
All this talk about Jim Keller... was really THAT important on Zen and future iterations? Wasn't his principal role just to manage the project while others did the real work?

CPU design is like any other mature technology engineering project. Its too complex for any one person to to make. [By immature technology, I mean the likes of Reaction Engines SABRE engine.]

Certainly, one bad egg can break it - but no one individual can carry it.
 

Richie Rich

Senior member
Jul 28, 2019
470
230
76
All this talk about Jim Keller... was really THAT important on Zen and future iterations? Wasn't his principal role just to manage the project while others did the real work?
1. What work do you mean? Could you imagine Jobs is coding MacOS or Elon Musk designing in CAD every small part of Tesla and Falcon rocket? Common guys. Great leaders they set a long term goals based on their visions. Easy to say but very hard to do. And huge consequences if they are wrong (remember Apple when fired Jobs? or Bulldozer?). That's what Keller did with Zen master plan.
2. From the integrity point of view. Keller left AMD when management (terrible Ruiz) started to be crazy and canceling ambitious projects. He knew this is gonna be road to hell. I don't admire AMD engineers for designing such a crap like Bulldozer and watching whole company goes almost to bankruptcy. The same engineers did great job with Zen because some leader told them what to do.
3. I can imagine meeting in AMD when engineers were presenting great new 2xALU arch Bulldozer. Ruiz and Meyer: "That's great, good job, this is our future!". If Keller would be at this meeting, his hypothetical answer would be: "What the hell is that, are you serious guys? We need completely different design, you need to do this and this..." And exactly this was his main job. Only haters can say he did nothing. IMHO AMD will regret canceling Keller's ARM ZEN in near future.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,024
6,740
136
...IMHO AMD will regret canceling Keller's ARM ZEN in near future.

Not at all. They didn't have the resources for both. x86 will rule the server market for the foreseeable future, and there is only one other player in that market. Same with laptops/desktops.

What market would K12 serve? How much money would it bring in compared to Zen? A tiny fraction. Zen saved AMD. If they wanted to resume work on K12 now I agree that might make sense, but shelving it at the time was 100% the right call, even if that is speculated to a be a big reason Jim Keller left.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
I can imagine meeting in AMD when engineers were presenting great new 2xALU arch Bulldozer.
三三ᕕ( ᐛ )ᕗ

"Alternately, integer instruction operations can be dispatched to the integer execution units 212 and 214 opportunistically. To illustrate, assume again that two threads T0 and T1 are being processed by the processing pipeline 200. In this example, the instruction dispatch module 210 can dispatch integer instruction operations from the threads T0 and T1 to either of the integer execution units 212 and 214 depending on thread priority, loading, forward progress requirements, and the like."

"In certain instances, the processing pipeline 200 may be processing only a single thread. In this case, the instruction dispatch module 210 can be configured to dispatch integer instruction operations associated with the thread to both integer execution units 212 and 214 based on a predefined or opportunistic dispatch scheme. Alternately, the instruction dispatch module 210 can be configured to dispatch integer instruction operations of the single thread to only one of the integer execution units 212 or 214 and the unused integer execution unit can be shut down or otherwise disabled so as to reduce power consumption. The unused integer execution unit can be disabled by, for example, reducing the power supplied to the circuitry of the integer execution unit, clock-gating the circuitry of the integer execution unit, and the like."

"To illustrate, the front-end unit 202 can fetch and decode instructions associated with a thread such that load instructions later in the program sequence of the thread are prefetched and dispatched to one of the integer execution units for execution while the other integer execution unit is still executing non-memory-access instructions at an earlier point in the program sequence. In this way, memory data will already be prefetched and available in a cache (or already in the process of being prefetched) by the time one of the integer execution units prepares to execute an instruction dependent on the load operation."

"By utilizing multiple integer execution units that share an FPU (or share multiple FPUs) and that share a single pre-processing front-end unit, increase processing bandwidth afforded by multiple execution units can be achieved while reducing or eliminating the design complexity and power consumption attendant with conventional designs that utilize a separate pre-processing front-end for each integer execution unit. Further, because in many instances it is the execution units that result in bottlenecks in processing pipelines, the use of a single shared front-end may introduce little, if any, delay in the processing bandwidth as the fetch, decode, and dispatch operations of the front-end unit often can be performed at a higher instruction-throughput than the instruction-throughput of two or more execution units combined."

AMD was never going to stop at single-threaded CMT, but instead diverge into clustered simultaneous multithreading.

Ex: Zen2 -> Adds CSMT -> Zen3
8x ALU(4 ALU per cluster), 6 AGUs(3 AGU per cluster),
Zen2-class split-FPU: 8x FPUs(4 per FPU cluster, potential fused 512-bit mode)
Zen2-class fused-FPU: 4x FPUs(No clusters pure 512-bit units)
Cluster of Zen-class FPUs; 16x FPUs(4*128-bit 2FMUL+2FADD) ((probably, would be complex enough to allow cluster-fusion for single-cycle 256-bit and 512-bit))
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
230
76
三三ᕕ( ᐛ )ᕗ

AMD was never going to stop at single-threaded CMT, but instead diverge into clustered simultaneous multithreading.

Ex: Zen2 -> Adds CSMT -> Zen3
8x ALU(4 ALU per cluster), 6 AGUs(3 AGU per cluster),
Zen2-class split-FPU: 8x FPUs(4 per FPU cluster, potential fused 512-bit mode)
Zen2-class fused-FPU: 4x FPUs(No clusters pure 512-bit units)
Cluster of Zen-class FPUs; 16x FPUs(4*128-bit 2FMUL+2FADD) ((probably, would be complex enough to allow cluster-fusion for single-cycle 256-bit and 512-bit))

Whole CMT is just bad/mediocre idea IMHO. However I can see you point. You mean doubling front-end to handle 4 threads and feeding dual Zen 4xALUs SMT2 cores? Kind of front-end fusion (like Bulldozer did). But I'm afraid there is the same disadvantage as at Bulldozer - no integer IPC gain, only FPU gain. It's half optimal solution.

I think it's more smart to make fusion of two cores entirely and create one big wide 8xALU, SMT4 core. There are only pros for this approach: high single core IPC for HPC, SMT4 brings better efficiency than SMT2 for high thread crunching. Well, it's probably not easy to develop :) However Apple showed that 6xALU core is not a problem, so I'm optimist regarding more ALUs.
 
Last edited:
  • Like
Reactions: Space Tyrant

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Whole CMT is just bad/mediocre idea IMHO. However I can see you point. You mean doubling front-end to handle 4 threads and feeding dual Zen 4xALUs SMT2 cores (like Bulldozer did)? Kind of front-end fusion. But I'm afraid there is the same disadvantage as at Bulldozer - no integer IPC gain, only FPU gain. It's half optimal solution.

I think it's more smart to make fusion of two cores entirely and create one big wide 8xALU, SMT4 core. There are only pros for this approach: high single core IPC for HPC, SMT4 brings better efficiency than SMT2 for high thread crunching. Well, it's probably not easy to develop :) However Apple showed that 6xALU core is not a problem, so I'm optimist regarding more ALUs.
In this case, Zen wouldn't be moving to cluster-based multithreading, but rather to clustered simultaneous multithreading. They utilize the same design methodology for different purposes.

In the case of adding CSMT to Zen2 to get Zen3;
ALU0-ALU1 have a 16-byte multiplex load/store port
ALU2-ALU3 have a 16-byte multiplex load/store port
One of the AGUs can transfer to/from the other cluster using the 16-byte multiplex.
Each cluster will have a level-0 data cache, a store/load allocation queue(essentially a L0 LSQ), and the bypass for cluster to cluster PRF load/store transfers(via SLAQ).
If no transfers are needed it is better to move thread-0 from cluster-0 to cluster-1. Etc.
All threads have access to each of the clusters but it is down in 2T round-robin at worst and 2T algorithmic at best. Thread-0,1,2,3 has all access to all 8 ALUs+6 AGUs.
All clusters will share a L1 cache, which where the load/store queue will be at.
L0 SLAQ = 16B(To L0ab/abPRF) to 32B(From L1), L1 LSQ = 32B to 64B

On the FPU, the cluster of Zen-class FPUs, would have better cross-lane potential. It would probably need the L1 to be able to load/store 64B to the FPU complex.
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
230
76
Jim Keller was at AMD in the late 90s and was involved in K7 and K8.
He's just another hater. Same persons which they hate Elon Musk, Steve Jobs... just anybody who achieved something significant.

Jim Keller's official job title at AMD in 2012-2015: Chief Architect of CPU cores.
And those people will ask so silly question: Was he really that important for Zen arch? There is nobody more important for Zen1, 2, 3 than Chief architect is. And this was Jim Keller at that time.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Zen3 is design complete.
7nm+.

How long from design compete to tape out? --> 7nm+ is significantly different from 7nm.

Probably late 2020 we might see Zen3

6D8CD6E4-1A45-403F-8F85-62AD16023366.jpeg
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,072
3,897
136
All things going to plan We will see zen3 in 2020 because a super computer is using it.

I'm hoping for 12core chiplet's using a bi dir ring so l3 cache is shared per chiplet. Still keep it as an eviction cache.
 

DrMrLordX

Lifer
Apr 27, 2000
22,899
12,963
136
Zen3 is design complete.
7nm+.

How long from design compete to tape out? --> 7nm+ is significantly different from 7nm.

AMD is promising one new uarch per year. In order for that to happen, Zen3 has to come in 2020 to every sector (except APUs; that's Renoir's territory). Milan first to high-priority ODMs, then uh Zen3 desktop, then HEDT and general launch of Milan afterwards. Unless they botch something, expect similar cadence to what you are seeing now with Zen2/Rome.
 

Richie Rich

Senior member
Jul 28, 2019
470
230
76
All things going to plan We will see zen3 in 2020 because a super computer is using it.

I'm hoping for 12core chiplet's using a bi dir ring so l3 cache is shared per chiplet. Still keep it as an eviction cache.
Assuming Zen3 is wider (6xALU, SMT4) core..... then 8core Zen3 chiplet area will be similar to 12c Zen2 area. I think they can keep well proofed quad-core CCX still.

Zen3 8c/32t chiplet might be +50% bigger in area/transistors/power consump, +70% in overall performance. This could be tight to fit 64c under heat-spreader.

BTW: Server launch of EPYC2 is nice. Intel is not happy. I think Intel is gonna cancel bonus salary for Keller this month because of Zen2 :D
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
Jim Keller was at AMD in the late 90s and was involved in K7 and K8.
What does that have to do with the crazy timeline Richie Rich talked about? Keller left AMD in 1999 and K8 launched in 2003, Keller left AMD in 2015 and Zen launched in 2017. In both cases Keller likely left when he did all he wanted to do. Nothing suggests he left "when management (terrible Ruiz) started to be crazy and canceling ambitious projects", Keller likely couldn't care less about that all, he barely waited for K7 to launch before leaving.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
What does that have to do with the crazy timeline Richie Rich talked about? Keller left AMD in 1999 and K8 launched in 2003, Keller left AMD in 2015 and Zen launched in 2017. In both cases Keller likely left when he did all he wanted to do. Nothing suggests he left "when management (terrible Ruiz) started to be crazy and canceling ambitious projects", Keller likely couldn't care less about that all, he barely waited for K7 to launch before leaving.

... and if he wasn't happy with whatever was following after K8?

Although saying that, Hector Ruiz didn't arrive till 2000. Jerry Sanders was calling the shots back then.
 

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
... and if he wasn't happy with whatever was following after K8?
Then he appears to be never happy with what is following after what he was working on, him usually leaving even before his work actually launches. :p It's just his MO, nothing more nothing less.
 

Panino Manino

Golden Member
Jan 28, 2017
1,143
1,383
136
The K8 we know is not the same "K8" that Jim Keller worked.
Wasn't an architecture that was cancelled from his time there? Same thing that happened with Zen and K12.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Assuming Zen3 is wider (6xALU, SMT4) core..... then 8core Zen3 chiplet area will be similar to 12c Zen2 area. I think they can keep well proofed quad-core CCX still.

Zen3 8c/32t chiplet might be +50% bigger in area/transistors/power consump, +70% in overall performance. This could be tight to fit 64c under heat-spreader.

BTW: Server launch of EPYC2 is nice. Intel is not happy. I think Intel is gonna cancel bonus salary for Keller this month because of Zen2 :D

A really interesting byproduct of a possible SMT4 implementation is the later APU; imagine a Ryzen 5400G 4C/16T with Navi graphics. That's a killer entry level workstation. But even more compelling to me is a potential wide Ryzen 4600 with 6C/24T. That's a huge value proposition. And then there's the child in me that wants to see a 4950X with 16C/64T.

Enough daydreaming, back to work.
 
  • Like
Reactions: Tlh97 and CHADBOGA

Richie Rich

Senior member
Jul 28, 2019
470
230
76
I read that Keller was chief architect of "K8 prototype" what was much powerful than public K8 (assuming that ex-DEC engineers were inspired by very powerful DEC Alpha EV8 with it's 4-way SMT). However it was canceled in favor of much simple K8, based on K7 with 64-bit instruction set. Keller left AMD in 1999 and K8 design was led by chief architect Fred Webber who developed K7 earlier. This is what I read.

I would like to know what was the spec of Keller's K8 prototype, but I cannot find it anywhere.
[Speculation] We can estimate that Keller's K8 prototype was something between old K7 (3xALU) and EV8 (wide core with SMT4). Maybe it was 4xALU SMT2 core as good compromise? However this would need re-design everything from scratch. Time, complexity and corporate politics... who knows what was the reason. Funny thing is that Zen1 is very similar to my estimation of original K8 prototype. Darwin's evolution will find it's way sooner or later.[/Speculation]

And regarding CEO Hector Ruiz. K10 is just an evolution of K8 (doubling the width of FPU, Adding L3 with nativ quad design) and still based on prehistoric K7. No wonder AMD started to slowly die with K10, horrible TBL bug, low frequencies, no inovation, good engineers left company, no ambitions. That's why I talk about CEO Hector Ruiz being terrible. Everybody knew AMD is going slowly to hell. When Dirk Meyer announced Bulldozer is gonna have 2xALU uarch, everybody knew AMD is done.

But you are right, Keller left AMD before Hector Ruiz arrived. My bad, I'm gonna correct my previous posts.

Funny thing is how history repeats:
- 1999 AMD canceled Keller's prototype for being to advanced and Keller left company
- 2012 begging for Keller's return to create advanced Zen
- 2015 AMD canceled Keller's prototype again (server ARM Zen) and Keller left company
- 2028 Are they gonna beg him again to develop ARM core? :D
 
Last edited:
  • Like
Reactions: amd6502

Panino Manino

Golden Member
Jan 28, 2017
1,143
1,383
136
I read that Keller was chief architect of "K8 prototype" what was much powerful than public K8 (assuming that ex-DEC engineers were inspired by very powerful DEC Alpha EV8 with it's 4-way SMT). However it was canceled in favor of much simple K8, based on K7 with 64-bit instruction set. Keller left AMD in 1999 and K8 design was led by chief architect Fred Webber who developed K7 earlier. This is what I read.

Sounds impressive but AMD did the right thing at the time.
With Intel trying to push Itanium AMD had a huge opportunity, plus K7 was still more than good enough to compete.