Speculation: Ryzen 4000 series/Zen 3

Page 141 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,603
5,789
136
SEV in use at Google. Previously SUSE was providing this service downstream.
SEV is almost done being upstreamed. SEV-SNP is being worked upon downstream.

SME(Zen)--> SEV(Zen2)-->SEV-SNP(Zen3)-->?(Zen4)

MPI kfd bits also being upstreamed.
amdgpu support for TMZ is also in. Looks like the whole circle is getting bigger and covering all the neccessary bits.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
While game engines are certainly amazing pieces of software development, they are not hyper optimized for any particular chip architecture. Who has the time or incentive to do so? There are new hardware configurations coming out from vendors every year that have different behaviors when it comes to performance profiling. You'd have to have, essentially, a modified code path in each game engine for each major revision of CPU architecture. Given the way that it seems that most games development houses work, they are literally just blasting through game development to push the next product out the door as quickly as possible in a state that they consider "good enough", optimization be damned as the customer can just buy better hardware if they want better performance.

The only thing that I see as having changed is the convergence of the two largest consoles (by volume) onto x86, and with the most recent generation using something that's going to be quite analagous to what you can get in desktop hardware (I mean, who was ever able to purchase an 8 core Jaguar core APU for their home computer?). Now, you're going to get to have a CPU that matches the new XBOX configuration very closely, and purchase a video card that also likely matches its capabilities, and where you won't have as much SSD throughput in the current generation, you'll be able to make up for it with RAID configurations or having more RAM, etc. Because of that, I see the possibility that these game engines will gain optimizations for the consoles that are also applicable in large part to the desktop counterparts that you can build as it likely in the studio's best interests to squeeze every last drop of performance from the consoles that they can.
 

Thibsie

Senior member
Apr 25, 2017
749
801
136
While game engines are certainly amazing pieces of software development, they are not hyper optimized for any particular chip architecture. Who has the time or incentive to do so? There are new hardware configurations coming out from vendors every year that have different behaviors when it comes to performance profiling. You'd have to have, essentially, a modified code path in each game engine for each major revision of CPU architecture. Given the way that it seems that most games development houses work, they are literally just blasting through game development to push the next product out the door as quickly as possible in a state that they consider "good enough", optimization be damned as the customer can just buy better hardware if they want better performance.

I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.

The only thing that I see as having changed is the convergence of the two largest consoles (by volume) onto x86, and with the most recent generation using something that's going to be quite analagous to what you can get in desktop hardware (I mean, who was ever able to purchase an 8 core Jaguar core APU for their home computer?). Now, you're going to get to have a CPU that matches the new XBOX configuration very closely, and purchase a video card that also likely matches its capabilities, and where you won't have as much SSD throughput in the current generation, you'll be able to make up for it with RAID configurations or having more RAM, etc. Because of that, I see the possibility that these game engines will gain optimizations for the consoles that are also applicable in large part to the desktop counterparts that you can build as it likely in the studio's best interests to squeeze every last drop of performance from the consoles that they can.

It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.
 

moinmoin

Diamond Member
Jun 1, 2017
4,950
7,659
136
SEV in use at Google. Previously SUSE was providing this service downstream.
SEV is almost done being upstreamed. SEV-SNP is being worked upon downstream.

SME(Zen)--> SEV(Zen2)-->SEV-SNP(Zen3)-->?(Zen4)

MPI kfd bits also being upstreamed.
amdgpu support for TMZ is also in. Looks like the whole circle is getting bigger and covering all the neccessary bits.
I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,603
5,789
136
I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.
Can't agree more. It should be the basic requirement when not using bare metal. Granted, bare metal could not protect you from the cloud provider but at least from other co hosted VMs.
Outside of compliance to local data retention/PII laws plus specific use cases, some on prem infra could move to public cloud.

So for those small enterprises this can really offer something. Your public facing service gateway need not have to reach out to some remote privately hosted database service everytime there is a request from the client.
You won't need a second "trusted" partner to host your sensitive data which is accessed by the public facing service.

Also one thing that is really interesting is TMZ. GPU VMs have been supported for a while now by AMD. Now GPU VM encryption using TMZ is possible. So again if your instance uses GPU acceleration this can also be encrypted.

In this area however, like I have alluded before, some Intel devs/maintainers have not really reached to an agreement to introduce GPU cgroup which was proposed by AMD. It was not accepted multiple times.
As someone who has been waiting for this GPU cgroup feature for sometime, because you can use GPU cgroup for many things like docker containers, or sharing compute between processes to guarantee some QoS, it is really frustrating.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.



It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.

I agree, many current games are minimum effort ports from existing consoles, which, while running x86 code, are using cores that are barely present in the market (even when new) and in an arrangement that was never publicly available. They almost had to go lowest common denominator.

At least, with the next gen consoles, they're using an arrangement that IS available for end users (Zen 2 in an 8 core configuration). If they at the very least target that, then even the lowest effort port to PC will still feature a greater degree of intrinsic optimization for at least currently shipping machines. Going forward, I don't think that a system setup for Zen2 will be hampered by Zen3 cores, which will cover production through the next two calendar years easily. As for years 3-5, we can only hope that anything being produced in that time frame will be more than fast enough to handle anything that was targeted at a platform that's three+ years old at that time. But, then, we'll be back where we are now, won't we?

I'm not saying that this is going to be the perfect solution, just that we're going to be the closest that we've ever been to well optimized game engines and end user games than we have been for a long time in the next few years.
 
  • Like
Reactions: Tlh97 and blckgrffn

blckgrffn

Diamond Member
May 1, 2003
9,126
3,066
136
www.teamjuchems.com
@LightningZ71

I think you are right in that this will be as convergent as it has *ever* been between consoles and computers.

There is also the decision to keep many xbox releases functional on ~1.8 ghz Jaguar cores for at least two years which in many ways is a bummer.

Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases. Which is a bigger bummer. Bring on a 5nm Switch 2... with some maybe 2ghz ARM cores, probably still four of them :D

This is massive jump in CPU power in consoles. It's pretty exciting to wonder what AAA, high end targeted titles are going to do with all that juice in a handful of years.

In ~3 years I think we can expect a refresh that will likely be as PC-like as this one from MS, Sony or both.
 

moinmoin

Diamond Member
Jun 1, 2017
4,950
7,659
136
Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases.
I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Bring on a 5nm Switch 2
Not going to happen - Nintendo originally went with a 2 year old SoC design with an even older CPU core on an old 20nm process for a reason.

Even the more recent version only uses a 16nm SoC, not even a 10nm/8nm derivative, let alone 7nm.

I would expect likewise that the next gen Nintendo console would be something like 8nm, or perhaps 7nm derivative once the industry leading edge has passed to 5nm derivatives or even 3nm.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.

 

blckgrffn

Diamond Member
May 1, 2003
9,126
3,066
136
www.teamjuchems.com
I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.

Fair enough. I would imagine that for studios, considerations for making a switch version of the game viable has become ever more financially imperative given the install base, especially for games that are going to be cross platform with xbox one & ps 4 in the near term. Some analyst somewhere is making a case for it... there are mores switches sold than Xbox Ones, I guess? So says VG chartz... Clearly, many AAA titles are likely to never consider it as a platform if they are also using their graphics or multiplayer aspects to sell the game.

As for the use of old manufacturing tech in the next Switch... sigh. Why do you have to be such downers? Surely they *could* surprise us with a more progressive platform. I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose :D Global Foundries should have a good amount of availability coming up, right? ;)
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
(does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose :D Global Foundries should have a good amount of availability coming up, right? ;)

Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.
 
  • Like
Reactions: blckgrffn

Saylick

Diamond Member
Sep 10, 2012
3,157
6,369
136
New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.


For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:
 
  • Like
Reactions: lightmanek

blckgrffn

Diamond Member
May 1, 2003
9,126
3,066
136
www.teamjuchems.com
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.

That's pretty interesting. Samsung would seem to have a big portfolio to choose from from CPU, to GPU (as noted) and process node availability, even the memory and flash. That seems almost too convenient to be true.

With nvidia being the vendor it was too easy to know what was going to happen as they had so few shipping products :)

Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 2 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
It would seem to be counter intuitive for MT gains to be less than ST for integer?
 

Saylick

Diamond Member
Sep 10, 2012
3,157
6,369
136
It would seem to be counter intuitive for MT gains to be less than ST for integer?
At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.
 
  • Like
Reactions: soresu

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.
Perhaps making use of their 8nm node - unless the Nin are cash proud from Switch and want to really cut loose for the first time since Gamecube.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier?
Cortex A78 rather than a Carmel derivative in Tegra Oren suggests that nVidia have packed it in on the custom core front, just like Samsung and Qualcomm before them.

They may get in the Cortex X action to do some custom additions to X2 or X3, but otherwise I would be surprised to see anything custom coming from nVidia's Tegra team any more.
 
  • Like
Reactions: Tlh97 and blckgrffn

Makaveli

Diamond Member
Feb 8, 2002
4,718
1,054
136
Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.

The PS5 and New Xbox are Zen 2 based?
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.

Which makes sense if Zen 2 and 3 have the same, or at least very similar power curves. Power might be the constraint, or it might not. Assuming these figures are true, It's likely a number of things.

A major factor could be SMT yield. If you get, say, a 20% IPC improvement in 1c/1t, then that doesn't necessarily extend to 1c/2t if greater resource utilisation is leaving less room for SMT to fill in the gaps.

That doesn't explain the gap from 32c/64t to 64c/128t, but it might be a factor when going from 1c/1t to 32c/64t.

Another explanation for the regression of performance improvement from 32 to 64 cores could be cache related, depending on what benchmarks the performance data are derived from. Given a hypothetical benchmark where little to no data is shared between threads and each thread wants >= 2MB L3, then @64t Milan has 2MB L3/t compared with 1MB L3/t for Rome, while @128t both have 1MB L3/t.

Milan doubles effective L3 over Rome in most cases, but in a scenario where each thread's data is practically distinct and there's little core-to-core communication, Rome's cache structure is moderately superior thanks to lower latency.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,769
3,144
136
Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense , ROME isn't power limited when running 128T of int code i wouldn'tr expect Milan to be either.
I dont think the amount of L3 cache per core matters much, unless AMD change the L3 to be more then just an eviction cache. If they start stream prefetching/predicting into it then L3 per core might matter more.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:

If they really have achieved 15% IPC improvement over Zen2 then I think the odds are they have slightly widened the core.

If the source (dubious track record to say the least) is accurate this time, then L3 latency jumped somewhat significantlly, from ~40 to 47 (almost 20%). The accessible L3 for a core however now has doubled. Such a tradeoff would have an advantage pretty much only for single threaded loads.

Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense

That's a good point. Maybe there was this focus on changing resource allocation to boost single thread performance.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,268
136
Such a tradeoff would have an advantage pretty much only for single threaded loads.

That's not how it works. A single threaded load that fits inside of a 16MB L3 still would benefit from Rome's cache structure, while a 16-thread load that wants more than 16MB L3, with mostly shared data, will work substantially better on Milan's cache structure.

By itself, how threaded a load is has nothing to do with anything. It's the size of the problem and how much data is shared between threads. Best case for Rome is something like running one VM per 4 core CCX, but the vast majority of workloads, from single threaded to embarrassingly parallel, are going to benefit more from Milan's cache structure.
 
  • Like
Reactions: dr1337

amd6502

Senior member
Apr 21, 2017
971
360
136
That's not how it works.

I'm considering the simpler case here of a single 8c/16t CCD AM4 chip. Consider a single thread running on this.

In Zen2 case we have a 2x16MB L3, in Zen3 we have a unified 1x32MB L3.

Assume the thread does not jump between CCX's (if applicable).

Now in Zen2 case the thread is limited to filling up to one of the 16MB L3 units.

In Zen3 case we have the thread limited to filling a 32MB L3 cache.

This means potentially significantly greater hit rate (though at the supposed cost of almost 20% latency hit).