Speculation: Ryzen 4000 series/Zen 3

Page 102 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

french toast

Senior member
Feb 22, 2017
988
825
136
While I totally agree this is what is going to happen, I personally wish the control over SMT would move into the processor, making it decide itself which amount of logical threads are most efficient for handling a given workload.

(And to be honest I'm fed up of discussing Windows as the obstacle to progress in CPU features.)
That sounds interesting.. A bit like selective 4wd in cars these days?
Probably be some use cases that trip it up just like there are efficiency downsides with the extra gubbins those fancy 4wd's carry even in 2wd mode over a pure 2wd vehicle...whilsts never being quite as capable as a proper all time 4wd.

Seriously unlikely for Ryzen anytime soon... Epyc?.. Makes alot more sense and as this is a new uarch I wouldn't be surprised if they increase the assets now for some possibility of SMT4 on 5nm for Epyc, with a wider core and lower clocked /throughput nature of datacentres, server cpus are ripe for something like this at the right time.
 

Veradun

Senior member
Jul 29, 2016
564
780
136
Does anyone have any thoughts on fabrication options for the IO die moving forward? Currently it is manufactured on 14nm at GloFlo. Will Zen3 IO die be manufactured on GloFlo's enhanced 12nm process? Or TSMC's 7nm?
Another question i had - would AMD ever consider GloFlo's FD-SOI process for any future IO die?
I was thinking about this the other day. Is it possible to go 12FDX? Does it have the requirements to be used for the motherchip?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,100
136
The only clients that have workflows that actually benefit from SMT-4 would know how to enable it in BIOS and would mostly be running linux anyway. Enabling SMT-4 out of the box on consumer chips IMO just seems dumb.

For single socket systems I doubt SMT4 is going to bring much gains. If AMD can work on their prediction, fetch, cache and Load store system and overall minimizing thread stall you can guarantee the gains from SMT4 will diminish greatly.
Patents indicate they are working hard on minimizing misprediction thread stall etc

Speculative DRAM memory read into L3
Speculative DRAM page activation
Cache control aware IMC
AGEN Bypass
Cache Bypass
BTB compression
Load/store combine
Unified AGU queue
Early return address prediction
...

From the X3D architecture it looks to me AMD is bringing data/memory even closer to the CPU more than ever, making SMT4 more niche considering that their entire goal is to minimize thread stalling by increasing data locality.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
Is it possible to go 12FDX?
Not yet, there hasn't been any tapeout/signoff runs or mpws yet. So, 12FDX is non-existent. Also, 12LP+ delayed the initial 12FDX from running at Malta.

However, 12FDX appears to be set to use newer things;
FD-2D Smartcut 2.0 wafers from SOITEC(Starts at 5nm SOI thickness vs 12nm SOI thickness)
A bunch of new materials/process steps will be added as well. Which is meant to push it towards the Networking/Computing/Server market solutions.

Ocean12:
"In building its FDX technology offer, GLOBALFOUNDRIES has substrate requirements that Soitec will develop and manufacture through the installation of a new pilot line. It is a first objective of this task to develop high quality SOI substrates to enable satisfactory yield and performance for 22FD, Next Gen 22FD, 12FD node and beyond circuits."

"We have to highlight the important role of GLOBALFOUNDRIES in the qualification of SOITEC SOI substrates pilot line. The sampling of the substrates will be provided to GLOBALFOUNDRIES to be implemented in their next Gen 22FDX and 12FDX pilot line. The incoming material check, inline parameters, defect checks as well as circuit yield data obtained at GLOBALFOUNDRIES will provide an important feedback to define substrate characteristics."

"OCEAN12, considering Next Gen 22FD & 12FD technologies, updated BOX electrical properties specifications, anticipated to support RBB / FBB extended use, will induce needs for additional development on BOX quality metrology & performance. The objective is to reach for this substrate generation a quality comparable to state of the art Gate Oxide."

^== SiO2 to HfO2 level of development objective on substrate oxide as well.

22FDX-NextGen (Mobility, Industrial, Space, but not Computing) and 12FDX (Multi-Market; Industrial/Mobility/Computing/Space)
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
That would be really interesting to let SMT be an on-the-fly switch, perhaps even on a per-core basis.
I was thinking per-core (or rather, per-process) indeed.

Seriously unlikely for Ryzen anytime soon... Epyc?.. Makes alot more sense and as this is a new uarch I wouldn't be surprised if they increase the assets now for some possibility of SMT4 on 5nm for Epyc, with a wider core and lower clocked /throughput nature of datacentres, server cpus are ripe for something like this at the right time.
Yes, it's clearly wishful thinking on my part. It doesn't even need to be for SMT-4, even for SMT-2 it would be an improvement in all the cases that run better with SMT disabled. And that would be useful even in Ryzen.

I guess my general thinking is that AMD managed to automatize the boost behavior of their chips beyond the hard coded tables used until the previous gens. As @DisEnchantment notes they appear to be working hard on improving the whole data management to make predictions and caches work more efficient, which includes a lot of automatizing logic. Handling something like SMT (at whatever size) in an automatic fashion would fit well in that line of progress.
 
  • Like
Reactions: french toast

Thibsie

Senior member
Apr 25, 2017
792
860
136
Exactly. Except Richie Rich will argue that till the cows come home. He Look at his sig ! Your post 2523 is where you are disputing his post, and I agree with you. You can't compare apples to steaks.

[Fun]
But cows and steaks are very closely related right?
[/fun]
 

Thunder 57

Platinum Member
Aug 19, 2007
2,791
4,061
136
Exactly. Except Richie Rich will argue that till the cows come home. He Look at his sig ! Your post 2523 is where you are disputing his post, and I agree with you. You can't compare apples to steaks.

Mark, when you reference a post number, could you kindly link them? It makes it so much easier for us than having to go back page(s) and find it.

Example. That will bring you right to 2523.

Also, I'm sure you know how I feel about Richie Rich. SMT4! Zen 3 uses Jim Keller's (a god apparently) K12 with 6ALUs!. But at the same time x86 is garbage!

Not saying he is an idiot, just that his beliefs are misguided. And he doesn't seem to be open to discussion since he seems so certain in many things.
 
Last edited:
  • Like
Reactions: Tlh97

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,720
14,747
136
Mark, when you reference a post number, could you kindly link them? It makes it so much easier for us than having to go back page(s) and find it.

Example. That will bring you right to 2523.

Also, I'm sure you know how I feel about Richie Rich. SMT4! Zen 3 uses Jim Keller's (a god apparently) K12 with 6ALUs!. But at the same time x86 is garbage!

Not saying he is an idiot, just that his beliefs are misguided. And he doesn't seem to be open to discussion since he seems so certain in many things.
Believe it or not, I have been on here 20 years, and did not know how to do that. But its the icon 2 to the right of the post number, correct ?
 
  • Like
Reactions: Drazick

Thunder 57

Platinum Member
Aug 19, 2007
2,791
4,061
136
Yup, just hover over the post number, right click, and copy link. Then you can use it in your own post to allow quick access.

I doubt its been possible for 20 years, but forums have improved and this was likely a nice little addition at some point. It's never to late to learn something new :) .
 
  • Like
Reactions: Tlh97 and Markfw

NobleX13

Member
Apr 7, 2020
27
18
41
These rumored changes to the CCX design and higher IPC definitely have me intrigued. I am "slumming it" on a Ryzen 5 1600 right now. Holding out for the 4000-series launch.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Yes, it's clearly wishful thinking on my part. It doesn't even need to be for SMT-4, even for SMT-2 it would be an improvement in all the cases that run better with SMT disabled. And that would be useful even in Ryzen.
Yes, I agree. OS scheduler can control SMT mode in SW way. If scheduler loads only one thread per physical core (and other virtual cores keeps empty) then it behaves like SMT-off even CPU is SMT4-ON or whatever number capable of. If scheduler loads two threads per core then it behaves like SMT2 etc. In theory there is possible to set SMT mode by configuring OS scheduler. And even more. You can set mix different SMT modes for different cores in the same CPU(or multiple CPUs in server). For example imagine you have 12-core Ryzen 4900 with SMT4 you can set for game to use 8 cores with SMT2 or SMT-off (whatever gives you best gaming performance) and run Blender render on background at remaining 4 cores using full SMT4 (so 16 threads). Only by SW way via OS scheduler. Today's stupid scheduler will utilize second thread by low priority Blender process even you set for game thread priority to very high resulting in performance degradation to half anyway, no matter what priority you set (and with SMT4 it would fall down to 1/4).

Problem is not higher number of virtual cores per physical core. Problem is that today's OS scheduler realy fails to manage thread performance over physical core. It's pure SW failure and BIOS SMT-off is just simple workaround. Sadly single core CPU was able control thread performance via process priority much better than modern SMT systems (Folding@Home at background with low priority set didn't hurt game at all). It's kind of mystery why OS scheduler cannot clear rest of virtual cores to maximize performance for process with higher priority. This would solve all problems related to SMT.
 
Last edited:

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
I personally wish the control over SMT would move into the processor, making it decide itself which amount of logical threads are most efficient for handling a given workload.

(And to be honest I'm fed up of discussing Windows as the obstacle to progress in CPU features.)

But that would require the CPU to know everything about the workload - and not just the micro op instructions.

If there was a software means of over-riding, or setting preferences to the CPU, then yeah - but if you have a problem that is only embarrassingly parallel if you have carefully formed the memory bounds of the problem, then you need the scheduler/CPU to respect that.

An example would be CFD - each process will be assigned (as much as is possible) a continuous block of adjacent cells to process calculations for. This reduces communication between processes - i.e. communication of information between adjacent cells across different processes - and that reduction is seen all the way from DRAM through to L1 cache. It can result in significant efficiency savings.


I'm not saying it couldn't happen. I'm not saying it shouldn't happen. I'm saying it would need significant thought perhaps beyond what you originally envisage.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,791
4,061
136
Yes, I agree. OS scheduler can control SMT mode in SW way. If scheduler loads only one thread per physical core (and other virtual cores keeps empty) then it behaves like SMT-off even CPU is SMT4-ON or whatever number capable of. If scheduler loads two threads per core then it behaves like SMT2 etc. In theory there is possible to set SMT mode by configuring OS scheduler. And even more. You can set mix different SMT modes for different cores in the same CPU(or multiple CPUs in server). For example imagine you have 12-core Ryzen 4900 with SMT4 you can set for game to use 8 cores with SMT2 or SMT-off (whatever gives you best gaming performance) and run Blender render on background at remaining 4 cores using full SMT4 (so 16 threads). Only by SW way via OS scheduler. Today's stupid scheduler will utilize second thread by low priority Blender process even you set for game thread priority to very high resulting in performance degradation to half anyway, no matter what priority you set (and with SMT4 it would fall down to 1/4).

Problem is not higher number of virtual cores per physical core. Problem is that today's OS scheduler realy fails to manage thread performance over physical core. It's pure SW failure and BIOS SMT-off is just simple workaround. Sadly single core CPU was able control thread performance via process priority much better than modern SMT systems (Folding@Home at background with low priority set didn't hurt game at all). It's kind of mystery why OS scheduler cannot clear rest of virtual cores to maximize performance for process with higher priority. This would solve all problems related to SMT.

This SMT4 stuff is way past getting old. There will be no SMT4 in Zen 3. Get over it.
 

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
But that would require the CPU to know everything about the workload - and not just the micro op instructions.
I think thanks to the huge caches AMD is slowly getting there. A lot of the prediction, fetch, cache and load/store optimizations that @DisEnchantment keeps posting patents about are already looking beyond the constraints of single cores, optimizing the data managements (and reducing stalls as a result) on essentially what is the network level (IF is technically not yet that, but getting there). Better adaption to workloads is a central part of such optimization, which requires knowledge of them.

In some ways in Zen chips the central brain already is no longer the CPU cores but the SCF. I fully expect AMD to expand the latter's role further and further.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Only 1.83x improvent with 4 to 8 times of L2 and Mediatek like "Optimization"!!:confused:
Apple's ARM chip has 4MB L2 per core compared to 512KB and 1MB per core AMD64 chips. The whole benchmark could fit in Apple's L2 cache.
Also Is the benchmark was compiled with same compiler?? Same OS?? Same Storage and RAM size and speed??

Edit : I was repling to the SpeCint2006 benchmark. looks like it is for signature.
Did God prohibit AMD and Intel to use same size of L2 cache like Apple? NO. Core2Duo was using big shared L2 cache 10 years ago. So you cry good but on a wrong shoulder here. You should write complain email to Apple headquarters to stop developing such a powerful cores because your ego cannot digest that your brand new x86 looks like garbage in compare to Apple uarch. Well, the problem is that you should complain 5 years ago because a very old Apple A9 Twister core from 2015 already had higher IPC by 7% than today's 9900K CoffeeLake and Zen2 :cool:


Stop with the insults/confrontational postings.
Take some more time off for reflection.

AT Mod Usandthem
 
Last edited by a moderator:

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
In some ways in Zen chips the central brain already is no longer the CPU cores but the SCF. I fully expect AMD to expand the latter's role further and further.
Interesting, since it seems like a lot of the gains we have seen have been so narrow - and doesn't it seem likely that we are on the very top of a curve on extracting IPC from a core refinement standpoint? Now we have to focus on the surrounding stuff.

Like a car, the core is the engine but there is so much more.

Right now, Intel is like Dodge with their Challenger - keep adding horsepower to an ancient-looking design. There is no such thing as "handling" or "rear visibility" and anyone who thinks those are a thing are not living in the "real world". AMD is like Ford with the Mustang - great power though less than the Challenger... but... it's actually faster to 60 than the power-oriented Challenge? What, did they put real tires on it? And it can turn?
 
  • Like
Reactions: Thunder 57

DrMrLordX

Lifer
Apr 27, 2000
21,765
11,085
136
Did God prohibit AMD and Intel to use same size of L2 cache like Apple?

Yes? Physics is a bitch. L2 takes more die space than L3, and as you may have noticed, having a lot of L3 with good prefetch units can do a lot of improve the performance of multicore CPUs in parallel workloads with lots of intercore communication. Which is one sort of workload for which Intel and AMD have optimized their CPUs. Compare that situation to Apple who exclusively uses their A-series SoCs in phones and tablets where bursty, single-threaded (or sparsely-threaded) applications predominate. There you have less likelihood of core->core writes, meaning maintaining cache coherency is less important (and therefore, shared L3 is less important). So Apple chose to spend a lot of die area on L2 that could have been spent elsewhere, or that could not have been spent at all (driving higher yields and/or lower costs per die). Apple has the freedom to charge insane amounts of money for their hardware, and they don't have any OEMs telling them to trim costs, since they provide all their own SoCs for their own designs from top to bottom.

Core2Duo was using big shared L2 cache 10 years ago.

Conroe only had two cores. Cache coherency on that generation of CPU wasn't that big of a deal. With shared L2, you didn't even have to think about which core had which data in its cache since it was in a shared L2 and since Intel was using an inclusive cache hierarchy; e.g. if your CPU couldn't find the data in L1d on Core 0 but it was in L1d of Core 1, it was guaranteed to be in the L2, so you wouldn't have to do any core->core communication to read that data into the L1d of Core 0.