Speculation: Ryzen 4000 series/Zen 3

Page 141 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Golden Member
Dec 19, 2014
1,324
520
136
Apart from worst case scenarios like Far Cry, I can't tell the difference between my R5 3600 and 8700K @ 5GHz either. Then again, I choose to game with IQ settings maxed or close to, so the GPU (5700XT) is the main bottleneck in almost all cases.

I'm not gonna put my head in the sand though and pretend that if I upgrade to Big Navi or GeForce 3000 series with +50% performance over current GPUs that it would be the same scenario.

AMD has done a fine job to this point in improving gaming on each Zen iteration, but if Ryzen 4000 actually overtakes my 8700K for gaming I would upgrade in a heartbeat.

Truthfully I'm not holding my breath on that, if GN's latest CPU bound gaming tests on the 3600XT vs 10600K is anything to go by, AMD is likely more than another iteration away from dethroning Intel at gaming.

The problem is that by the time AMD actually has Skylake beating gaming performance, you'd think Intel would be done milking their dead 14nm cow...

Yes, I'm aware that in 'blind test' scenarios like you described the difference might be negligible in most cases, at least with current gen GPUs. But that's a rather simplistic stance to take IMO, as the balance between being CPU bound and GPU bound can easily shift depending on the settings you use, and as faster GPUs come out in the coming years we'll ideally have 'faster-than-Skylake' CPUs as well to keep up the pace.
IMHO the very fact that any games are still CPU bound at all is a testament to just how badly optimised PC games are in that regard.

You would hope that a mostly standardised feature set with only a handful of well optimised common game engines (Unreal/Unity/CryEngine) would have erased this problem by now.

It seems like those engines need some kind of LTS version which freezes feature set and concentrates on nothing but optimisation - a strategy I believe MS could also benefit from following with Windows 10 for that matter.
 

DisEnchantment

Senior member
Mar 3, 2017
578
1,077
106
SEV in use at Google. Previously SUSE was providing this service downstream.
SEV is almost done being upstreamed. SEV-SNP is being worked upon downstream.

SME(Zen)--> SEV(Zen2)-->SEV-SNP(Zen3)-->?(Zen4)

MPI kfd bits also being upstreamed.
amdgpu support for TMZ is also in. Looks like the whole circle is getting bigger and covering all the neccessary bits.
 

LightningZ71

Senior member
Mar 10, 2017
417
314
106
While game engines are certainly amazing pieces of software development, they are not hyper optimized for any particular chip architecture. Who has the time or incentive to do so? There are new hardware configurations coming out from vendors every year that have different behaviors when it comes to performance profiling. You'd have to have, essentially, a modified code path in each game engine for each major revision of CPU architecture. Given the way that it seems that most games development houses work, they are literally just blasting through game development to push the next product out the door as quickly as possible in a state that they consider "good enough", optimization be damned as the customer can just buy better hardware if they want better performance.

The only thing that I see as having changed is the convergence of the two largest consoles (by volume) onto x86, and with the most recent generation using something that's going to be quite analagous to what you can get in desktop hardware (I mean, who was ever able to purchase an 8 core Jaguar core APU for their home computer?). Now, you're going to get to have a CPU that matches the new XBOX configuration very closely, and purchase a video card that also likely matches its capabilities, and where you won't have as much SSD throughput in the current generation, you'll be able to make up for it with RAID configurations or having more RAM, etc. Because of that, I see the possibility that these game engines will gain optimizations for the consoles that are also applicable in large part to the desktop counterparts that you can build as it likely in the studio's best interests to squeeze every last drop of performance from the consoles that they can.
 

Thibsie

Member
Apr 25, 2017
50
52
61
While game engines are certainly amazing pieces of software development, they are not hyper optimized for any particular chip architecture. Who has the time or incentive to do so? There are new hardware configurations coming out from vendors every year that have different behaviors when it comes to performance profiling. You'd have to have, essentially, a modified code path in each game engine for each major revision of CPU architecture. Given the way that it seems that most games development houses work, they are literally just blasting through game development to push the next product out the door as quickly as possible in a state that they consider "good enough", optimization be damned as the customer can just buy better hardware if they want better performance.
I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.

The only thing that I see as having changed is the convergence of the two largest consoles (by volume) onto x86, and with the most recent generation using something that's going to be quite analagous to what you can get in desktop hardware (I mean, who was ever able to purchase an 8 core Jaguar core APU for their home computer?). Now, you're going to get to have a CPU that matches the new XBOX configuration very closely, and purchase a video card that also likely matches its capabilities, and where you won't have as much SSD throughput in the current generation, you'll be able to make up for it with RAID configurations or having more RAM, etc. Because of that, I see the possibility that these game engines will gain optimizations for the consoles that are also applicable in large part to the desktop counterparts that you can build as it likely in the studio's best interests to squeeze every last drop of performance from the consoles that they can.
It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.
 

moinmoin

Golden Member
Jun 1, 2017
1,657
1,594
106
SEV in use at Google. Previously SUSE was providing this service downstream.
SEV is almost done being upstreamed. SEV-SNP is being worked upon downstream.

SME(Zen)--> SEV(Zen2)-->SEV-SNP(Zen3)-->?(Zen4)

MPI kfd bits also being upstreamed.
amdgpu support for TMZ is also in. Looks like the whole circle is getting bigger and covering all the neccessary bits.
I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.
 

DisEnchantment

Senior member
Mar 3, 2017
578
1,077
106
I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.
Can't agree more. It should be the basic requirement when not using bare metal. Granted, bare metal could not protect you from the cloud provider but at least from other co hosted VMs.
Outside of compliance to local data retention/PII laws plus specific use cases, some on prem infra could move to public cloud.

So for those small enterprises this can really offer something. Your public facing service gateway need not have to reach out to some remote privately hosted database service everytime there is a request from the client.
You won't need a second "trusted" partner to host your sensitive data which is accessed by the public facing service.

Also one thing that is really interesting is TMZ. GPU VMs have been supported for a while now by AMD. Now GPU VM encryption using TMZ is possible. So again if your instance uses GPU acceleration this can also be encrypted.

In this area however, like I have alluded before, some Intel devs/maintainers have not really reached to an agreement to introduce GPU cgroup which was proposed by AMD. It was not accepted multiple times.
As someone who has been waiting for this GPU cgroup feature for sometime, because you can use GPU cgroup for many things like docker containers, or sharing compute between processes to guarantee some QoS, it is really frustrating.
 

LightningZ71

Senior member
Mar 10, 2017
417
314
106
I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.



It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.
I agree, many current games are minimum effort ports from existing consoles, which, while running x86 code, are using cores that are barely present in the market (even when new) and in an arrangement that was never publicly available. They almost had to go lowest common denominator.

At least, with the next gen consoles, they're using an arrangement that IS available for end users (Zen 2 in an 8 core configuration). If they at the very least target that, then even the lowest effort port to PC will still feature a greater degree of intrinsic optimization for at least currently shipping machines. Going forward, I don't think that a system setup for Zen2 will be hampered by Zen3 cores, which will cover production through the next two calendar years easily. As for years 3-5, we can only hope that anything being produced in that time frame will be more than fast enough to handle anything that was targeted at a platform that's three+ years old at that time. But, then, we'll be back where we are now, won't we?

I'm not saying that this is going to be the perfect solution, just that we're going to be the closest that we've ever been to well optimized game engines and end user games than we have been for a long time in the next few years.
 
  • Like
Reactions: Tlh97 and blckgrffn

blckgrffn

Diamond Member
May 1, 2003
6,859
158
106
www.teamjuchems.com
@LightningZ71

I think you are right in that this will be as convergent as it has *ever* been between consoles and computers.

There is also the decision to keep many xbox releases functional on ~1.8 ghz Jaguar cores for at least two years which in many ways is a bummer.

Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases. Which is a bigger bummer. Bring on a 5nm Switch 2... with some maybe 2ghz ARM cores, probably still four of them :D

This is massive jump in CPU power in consoles. It's pretty exciting to wonder what AAA, high end targeted titles are going to do with all that juice in a handful of years.

In ~3 years I think we can expect a refresh that will likely be as PC-like as this one from MS, Sony or both.
 

moinmoin

Golden Member
Jun 1, 2017
1,657
1,594
106
Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases.
I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.
 

soresu

Golden Member
Dec 19, 2014
1,324
520
136
Bring on a 5nm Switch 2
Not going to happen - Nintendo originally went with a 2 year old SoC design with an even older CPU core on an old 20nm process for a reason.

Even the more recent version only uses a 16nm SoC, not even a 10nm/8nm derivative, let alone 7nm.

I would expect likewise that the next gen Nintendo console would be something like 8nm, or perhaps 7nm derivative once the industry leading edge has passed to 5nm derivatives or even 3nm.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,097
481
136
New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.

 

blckgrffn

Diamond Member
May 1, 2003
6,859
158
106
www.teamjuchems.com
I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.
Fair enough. I would imagine that for studios, considerations for making a switch version of the game viable has become ever more financially imperative given the install base, especially for games that are going to be cross platform with xbox one & ps 4 in the near term. Some analyst somewhere is making a case for it... there are mores switches sold than Xbox Ones, I guess? So says VG chartz... Clearly, many AAA titles are likely to never consider it as a platform if they are also using their graphics or multiplayer aspects to sell the game.

As for the use of old manufacturing tech in the next Switch... sigh. Why do you have to be such downers? Surely they *could* surprise us with a more progressive platform. I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose :D Global Foundries should have a good amount of availability coming up, right? ;)
 
Last edited:

jpiniero

Diamond Member
Oct 1, 2010
7,924
1,206
126
(does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose :D Global Foundries should have a good amount of availability coming up, right? ;)
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.
 
  • Like
Reactions: blckgrffn

Saylick

Senior member
Sep 10, 2012
698
209
116
New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.

For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:
 
  • Like
Reactions: lightmanek

blckgrffn

Diamond Member
May 1, 2003
6,859
158
106
www.teamjuchems.com
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.
That's pretty interesting. Samsung would seem to have a big portfolio to choose from from CPU, to GPU (as noted) and process node availability, even the memory and flash. That seems almost too convenient to be true.

With nvidia being the vendor it was too easy to know what was going to happen as they had so few shipping products :)

Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 2 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.
 
Last edited:

Saylick

Senior member
Sep 10, 2012
698
209
116
It would seem to be counter intuitive for MT gains to be less than ST for integer?
At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.
 
  • Like
Reactions: soresu

soresu

Golden Member
Dec 19, 2014
1,324
520
136
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.
Perhaps making use of their 8nm node - unless the Nin are cash proud from Switch and want to really cut loose for the first time since Gamecube.
 

soresu

Golden Member
Dec 19, 2014
1,324
520
136
I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier?
Cortex A78 rather than a Carmel derivative in Tegra Oren suggests that nVidia have packed it in on the custom core front, just like Samsung and Qualcomm before them.

They may get in the Cortex X action to do some custom additions to X2 or X3, but otherwise I would be surprised to see anything custom coming from nVidia's Tegra team any more.
 
  • Like
Reactions: Tlh97 and blckgrffn

Makaveli

Diamond Member
Feb 8, 2002
4,060
128
106
Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.
The PS5 and New Xbox are Zen 2 based?
 

HurleyBird

Platinum Member
Apr 22, 2003
2,097
481
136
At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.
Which makes sense if Zen 2 and 3 have the same, or at least very similar power curves. Power might be the constraint, or it might not. Assuming these figures are true, It's likely a number of things.

A major factor could be SMT yield. If you get, say, a 20% IPC improvement in 1c/1t, then that doesn't necessarily extend to 1c/2t if greater resource utilisation is leaving less room for SMT to fill in the gaps.

That doesn't explain the gap from 32c/64t to 64c/128t, but it might be a factor when going from 1c/1t to 32c/64t.

Another explanation for the regression of performance improvement from 32 to 64 cores could be cache related, depending on what benchmarks the performance data are derived from. Given a hypothetical benchmark where little to no data is shared between threads and each thread wants >= 2MB L3, then @64t Milan has 2MB L3/t compared with 1MB L3/t for Rome, while @128t both have 1MB L3/t.

Milan doubles effective L3 over Rome in most cases, but in a scenario where each thread's data is practically distinct and there's little core-to-core communication, Rome's cache structure is moderately superior thanks to lower latency.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,019
1,310
136
Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense , ROME isn't power limited when running 128T of int code i wouldn'tr expect Milan to be either.
I dont think the amount of L3 cache per core matters much, unless AMD change the L3 to be more then just an eviction cache. If they start stream prefetching/predicting into it then L3 per core might matter more.
 

amd6502

Senior member
Apr 21, 2017
796
255
106
For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:
If they really have achieved 15% IPC improvement over Zen2 then I think the odds are they have slightly widened the core.

If the source (dubious track record to say the least) is accurate this time, then L3 latency jumped somewhat significantlly, from ~40 to 47 (almost 20%). The accessible L3 for a core however now has doubled. Such a tradeoff would have an advantage pretty much only for single threaded loads.

Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense
That's a good point. Maybe there was this focus on changing resource allocation to boost single thread performance.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,097
481
136
Such a tradeoff would have an advantage pretty much only for single threaded loads.
That's not how it works. A single threaded load that fits inside of a 16MB L3 still would benefit from Rome's cache structure, while a 16-thread load that wants more than 16MB L3, with mostly shared data, will work substantially better on Milan's cache structure.

By itself, how threaded a load is has nothing to do with anything. It's the size of the problem and how much data is shared between threads. Best case for Rome is something like running one VM per 4 core CCX, but the vast majority of workloads, from single threaded to embarrassingly parallel, are going to benefit more from Milan's cache structure.
 
  • Like
Reactions: dr1337

ASK THE COMMUNITY