Speculation: Ryzen 4000 series/Zen 3

Thibsie · Jul 14, 2020

LightningZ71 said:
While game engines are certainly amazing pieces of software development, they are not hyper optimized for any particular chip architecture. Who has the time or incentive to do so? There are new hardware configurations coming out from vendors every year that have different behaviors when it comes to performance profiling. You'd have to have, essentially, a modified code path in each game engine for each major revision of CPU architecture. Given the way that it seems that most games development houses work, they are literally just blasting through game development to push the next product out the door as quickly as possible in a state that they consider "good enough", optimization be damned as the customer can just buy better hardware if they want better performance.

I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.

LightningZ71 said:
The only thing that I see as having changed is the convergence of the two largest consoles (by volume) onto x86, and with the most recent generation using something that's going to be quite analagous to what you can get in desktop hardware (I mean, who was ever able to purchase an 8 core Jaguar core APU for their home computer?). Now, you're going to get to have a CPU that matches the new XBOX configuration very closely, and purchase a video card that also likely matches its capabilities, and where you won't have as much SSD throughput in the current generation, you'll be able to make up for it with RAID configurations or having more RAM, etc. Because of that, I see the possibility that these game engines will gain optimizations for the consoles that are also applicable in large part to the desktop counterparts that you can build as it likely in the studio's best interests to squeeze every last drop of performance from the consoles that they can.

It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.

moinmoin · Jul 14, 2020

DisEnchantment said:
Google’s new Confidential Virtual Machines on 2nd Gen AMD EPYC

www.anandtech.com

SEV in use at Google. Previously SUSE was providing this service downstream.
SEV is almost done being upstreamed. SEV-SNP is being worked upon downstream.

SME(Zen)--> SEV(Zen2)-->SEV-SNP(Zen3)-->?(Zen4)

MPI kfd bits also being upstreamed.
amdgpu support for TMZ is also in. Looks like the whole circle is getting bigger and covering all the neccessary bits.

I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.

DisEnchantment · Jul 14, 2020

moinmoin said:
I've mentioned it before, the development AMD is doing in this area is so overdue for the whole industry it isn't funny. Though I find it ridiculous Google turns it into a product called "Confidential VM", this should be standard if not the bare minimum in any serious cloud! I dare Google to call all their other VM based products inconfidential/leaky.

Can't agree more. It should be the basic requirement when not using bare metal. Granted, bare metal could not protect you from the cloud provider but at least from other co hosted VMs.
Outside of compliance to local data retention/PII laws plus specific use cases, some on prem infra could move to public cloud.

So for those small enterprises this can really offer something. Your public facing service gateway need not have to reach out to some remote privately hosted database service everytime there is a request from the client.
You won't need a second "trusted" partner to host your sensitive data which is accessed by the public facing service.

Also one thing that is really interesting is TMZ. GPU VMs have been supported for a while now by AMD. Now GPU VM encryption using TMZ is possible. So again if your instance uses GPU acceleration this can also be encrypted.

In this area however, like I have alluded before, some Intel devs/maintainers have not really reached to an agreement to introduce GPU cgroup which was proposed by AMD. It was not accepted multiple times.
As someone who has been waiting for this GPU cgroup feature for sometime, because you can use GPU cgroup for many things like docker containers, or sharing compute between processes to guarantee some QoS, it is really frustrating.

LightningZ71 · Jul 14, 2020

Thibsie said:
I don't think that's whjat was meant. THe *current* problem of many games is that many of'em are straight ports from consoles.

It will provide a minimum optimization and probably a max one too: one size fits all. It it runs OKish, no effort will be made to optimize (either performance or quality) for the PC version.
If you think the hardware wil be a lot alike, think about the situation in 5 years or more.

I agree, many current games are minimum effort ports from existing consoles, which, while running x86 code, are using cores that are barely present in the market (even when new) and in an arrangement that was never publicly available. They almost had to go lowest common denominator.

At least, with the next gen consoles, they're using an arrangement that IS available for end users (Zen 2 in an 8 core configuration). If they at the very least target that, then even the lowest effort port to PC will still feature a greater degree of intrinsic optimization for at least currently shipping machines. Going forward, I don't think that a system setup for Zen2 will be hampered by Zen3 cores, which will cover production through the next two calendar years easily. As for years 3-5, we can only hope that anything being produced in that time frame will be more than fast enough to handle anything that was targeted at a platform that's three+ years old at that time. But, then, we'll be back where we are now, won't we?

I'm not saying that this is going to be the perfect solution, just that we're going to be the closest that we've ever been to well optimized game engines and end user games than we have been for a long time in the next few years.

blckgrffn · Jul 14, 2020

@LightningZ71

I think you are right in that this will be as convergent as it has *ever* been between consoles and computers.

There is also the decision to keep many xbox releases functional on ~1.8 ghz Jaguar cores for at least two years which in many ways is a bummer.

Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases. Which is a bigger bummer. Bring on a 5nm Switch 2... with some maybe 2ghz ARM cores, probably still four of them

This is massive jump in CPU power in consoles. It's pretty exciting to wonder what AAA, high end targeted titles are going to do with all that juice in a handful of years.

In ~3 years I think we can expect a refresh that will likely be as PC-like as this one from MS, Sony or both.

moinmoin · Jul 14, 2020

blckgrffn said:
Beyond that, however, is that with the Switch being as successful as it has been, I think we can expect that a ~4 core 1 ghz older ARM CPU will continue to be the minimum CPU for many cross platform releases.

I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.

soresu · Jul 14, 2020

blckgrffn said:
Bring on a 5nm Switch 2

Not going to happen - Nintendo originally went with a 2 year old SoC design with an even older CPU core on an old 20nm process for a reason.

Even the more recent version only uses a 16nm SoC, not even a 10nm/8nm derivative, let alone 7nm.

I would expect likewise that the next gen Nintendo console would be something like 8nm, or perhaps 7nm derivative once the industry leading edge has passed to 5nm derivatives or even 3nm.

HurleyBird · Jul 15, 2020

New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.

blckgrffn · Jul 15, 2020

moinmoin said:
I don't think Switch is really considered for most current gen games, never mind next gen ones. Switch very rarely sees same day releases, and the few ports it gets are late ports with significant adaptions by external developers.

Fair enough. I would imagine that for studios, considerations for making a switch version of the game viable has become ever more financially imperative given the install base, especially for games that are going to be cross platform with xbox one & ps 4 in the near term. Some analyst somewhere is making a case for it... there are mores switches sold than Xbox Ones, I guess? So says VG chartz... Clearly, many AAA titles are likely to never consider it as a platform if they are also using their graphics or multiplayer aspects to sell the game.

As for the use of old manufacturing tech in the next Switch... sigh. Why do you have to be such downers? Surely they *could* surprise us with a more progressive platform. I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose

Global Foundries should have a good amount of availability coming up, right?

jpiniero · Jul 15, 2020

blckgrffn said:
(does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier? I realize that's a 2020 release chip but I don't seem to be able find much about their next efforts... and hey, 12nm is the ligthography! Seems like a great fit for Nintendo in 2022, I suppose Global Foundries should have a good amount of availability coming up, right?

Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.

Saylick · Jul 15, 2020

HurleyBird said:
New Adored video claims Milan is >20% faster in single threaded INT than Rome, dropping to 10-15% on 64c/128t.

For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:

https://twitter.com/x/status/1283088099866804227

blckgrffn · Jul 15, 2020

jpiniero said:
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.

That's pretty interesting. Samsung would seem to have a big portfolio to choose from from CPU, to GPU (as noted) and process node availability, even the memory and flash. That seems almost too convenient to be true.

With nvidia being the vendor it was too easy to know what was going to happen as they had so few shipping products

Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 2 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.

soresu · Jul 15, 2020

It would seem to be counter intuitive for MT gains to be less than ST for integer?

Saylick · Jul 15, 2020

soresu said:
It would seem to be counter intuitive for MT gains to be less than ST for integer?

At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.

soresu · Jul 15, 2020

jpiniero said:
Rumor is they will be using a Samsung ARM design that uses RDNA as the GPU. Node unknown.

Perhaps making use of their 8nm node - unless the Nin are cash proud from Switch and want to really cut loose for the first time since Gamecube.

soresu · Jul 15, 2020

blckgrffn said:
I know the odds are extremely low (does nvidia currently even have a next gen/current gen ARM core?) beyond Xavier?

Cortex A78 rather than a Carmel derivative in Tegra Oren suggests that nVidia have packed it in on the custom core front, just like Samsung and Qualcomm before them.

They may get in the Cortex X action to do some custom additions to X2 or X3, but otherwise I would be surprised to see anything custom coming from nVidia's Tegra team any more.

Makaveli · Jul 15, 2020

blckgrffn said:
Sorry, didn't meant to derail the thread. I am still really excited about these consoles being Zen 3 based instead of being "safer" and using Zen+ at really conservative clocks which seems like how it might have logically progressed from Jaguar. Even that would have been a massive step in the right direction.

The PS5 and New Xbox are Zen 2 based?

blckgrffn · Jul 15, 2020

Makaveli said:
The PS5 and New Xbox are Zen 2 based?

Yes, you're right. *edits earlier post*

Wishful thinking.

HurleyBird · Jul 15, 2020

Saylick said:
At higher core counts, the clocks won't be increased by all too much over Rome. So if we assume at 32 cores the 20% ST performance increase is 10-15% IPC and 5-10% clocks, then at 64 cores where you don't have any clock improvement, your overall ST performance increase is just the IPC gains, or 10-15% only.

Which makes sense if Zen 2 and 3 have the same, or at least very similar power curves. Power might be the constraint, or it might not. Assuming these figures are true, It's likely a number of things.

A major factor could be SMT yield. If you get, say, a 20% IPC improvement in 1c/1t, then that doesn't necessarily extend to 1c/2t if greater resource utilisation is leaving less room for SMT to fill in the gaps.

That doesn't explain the gap from 32c/64t to 64c/128t, but it might be a factor when going from 1c/1t to 32c/64t.

Another explanation for the regression of performance improvement from 32 to 64 cores could be cache related, depending on what benchmarks the performance data are derived from. Given a hypothetical benchmark where little to no data is shared between threads and each thread wants >= 2MB L3, then @64t Milan has 2MB L3/t compared with 1MB L3/t for Rome, while @128t both have 1MB L3/t.

Milan doubles effective L3 over Rome in most cases, but in a scenario where each thread's data is practically distinct and there's little core-to-core communication, Rome's cache structure is moderately superior thanks to lower latency.

itsmydamnation · Jul 15, 2020

Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense , ROME isn't power limited when running 128T of int code i wouldn'tr expect Milan to be either.
I dont think the amount of L3 cache per core matters much, unless AMD change the L3 to be more then just an eviction cache. If they start stream prefetching/predicting into it then L3 per core might matter more.

amd6502 · Jul 16, 2020

Saylick said:
For what it's worth, Charlie from Semi-Accurate reports the same figure (>20% ST performance gains from Rome to Milan). There's a ton of insider scoop in this call, far too much information to digest, but you can listen to the recording here:

https://twitter.com/x/status/1283088099866804227

If they really have achieved 15% IPC improvement over Zen2 then I think the odds are they have slightly widened the core.

If the source (dubious track record to say the least) is accurate this time, then L3 latency jumped somewhat significantlly, from ~40 to 47 (almost 20%). The accessible L3 for a core however now has doubled. Such a tradeoff would have an advantage pretty much only for single threaded loads.

itsmydamnation said:
Assuming that this rumor is true SMT "yeild" decreasing becuse 1T can make better use of existing resources makes most sense

That's a good point. Maybe there was this focus on changing resource allocation to boost single thread performance.

HurleyBird · Jul 16, 2020

amd6502 said:
Such a tradeoff would have an advantage pretty much only for single threaded loads.

That's not how it works. A single threaded load that fits inside of a 16MB L3 still would benefit from Rome's cache structure, while a 16-thread load that wants more than 16MB L3, with mostly shared data, will work substantially better on Milan's cache structure.

By itself, how threaded a load is has nothing to do with anything. It's the size of the problem and how much data is shared between threads. Best case for Rome is something like running one VM per 4 core CCX, but the vast majority of workloads, from single threaded to embarrassingly parallel, are going to benefit more from Milan's cache structure.

amd6502 · Jul 16, 2020

HurleyBird said:
That's not how it works.

I'm considering the simpler case here of a single 8c/16t CCD AM4 chip. Consider a single thread running on this.

In Zen2 case we have a 2x16MB L3, in Zen3 we have a unified 1x32MB L3.

Assume the thread does not jump between CCX's (if applicable).

Now in Zen2 case the thread is limited to filling up to one of the 16MB L3 units.

In Zen3 case we have the thread limited to filling a 32MB L3 cache.

This means potentially significantly greater hit rate (though at the supposed cost of almost 20% latency hit).

HurleyBird · Jul 16, 2020

amd6502 said:
Now in Zen2 case the thread is limited to filling up to one of the 16MB L3 units.

In Zen3 case we have the thread limited to filling a 32MB L3 cache.

This means potentially significantly greater hit rate (though at the supposed cost of almost 20% latency hit).

Point is, you can replace "a thread" with "two threads," "three threads," or however many threads. The number of threads by itself doesn't make a difference. What does is the extent to which datasets fits into 16 MB L3, and the extent to which data is specific to individual threads vs. shared between them.

A hypothetical single threaded task that can consume the entire 32 MB L3 will benefit.

A hypothetical task that consumes all 16 threads in a chiplet and fills the entire 32MB L3 with shared data will benefit extremely, even more than the prior example.

A hypothetical single threaded task that fits entirely inside a 16 MB L3 will regress.

A program that creates two processes that each consume 8 threads and 16 MB (eg. perfectly in-line with the Zen 2 CCX structure) will regress extremely, even more than the former example.

A significant majority of tasks will benefit, both single and multi-threaded. But some minority of both single threaded and multi-threaded tasks will regress. To say that "Such a tradeoff would have an advantage pretty much only for single threaded loads" is entirely misleading and seems to misunderstand how things work, not to mention that there are plenty of database workloads that would profusely disagree with that statement.

DisEnchantment · Jul 16, 2020

Zen3 pushed out late has something to do with wafer availability rather than silicon readiness, in my opinion.
Also in the recent web conference with Papermaster, they said that they are working with both Zen4 and Zen3 silicon. So they might be having early Zen4 silicon already.

Sony is reportedly planning to assemble 10m consoles till end of the year to cover launch and early 2021 according to report.

Bloomberg - Are you a robot?

Even if we assume 6m consoles for PS5 that would be 42k+ wafers which AMD has to deliver by Q3 to early Q4 assuming a conservative die size of 295mm2 for PS5 SoC.
If MS were to assemble 4m consoles that would be another 38K wafers. Thats 80K+ wafers for the consoles starting from June (AMD's statement of console ramp up).
That is like 60% of AMD's wafer allocation for entire H2 and 100% till end of Q3. They took over some Huawei wafers, but this is not available until last quarter.

If anything, trends shows people buy more gaming equipment during the epidemic.

If Zen3 uses TSMC's IOD then they would probably have preferred to prolong Zen2 due to usage of GF IOD and high yielding Zen2 chiplets.
But whatever it may be, Zen3 CPUs would be scarce at I launch I could surmise and be prepared to pay top dollars for them.

Speculation: Ryzen 4000 series/Zen 3

Golden Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Platinum Member

Golden Member