Speculation: Ryzen 4000 series/Zen 3

Greyguy1948 · Oct 25, 2020

If you look at this Block Diagram for Zen 2:
Zen 2 at Wikichip

Any ideas about Zen 3?
More exec pipes?
More decoders? Intel have 5

soresu · Oct 25, 2020

Greyguy1948 said:
If you look at this Block Diagram for Zen 2:
Zen 2 at Wikichip

Any ideas about Zen 3?
More exec pipes?
More decoders? Intel have 5

We can expect an official full deep dive fairly soon I think.

I'd be surprised if it comes out significantly later than the Nov 5th release date for the actual CPU's.

Ajay · Oct 25, 2020

itsmydamnation said:
But if they increased load and store width and queue depths, one of the biggest SMT bottlenecks will be relieved. So i wouldn't be surprised to see SMT yeild increase or stay the same. I also wouldn't be surprised to see memory bandwdith , IO die etc to be a bottleneck when scaling workloads.

The interesting rumors arethat AMD are doing both 12nm and 7nm IOD for EPYC, if so that would be a very interesting comparison and give good insight to what warhol/DDR5 might bring.

Well, then we have another problem, if in fact, IO bottle necks the load/store improvements - that and mem bandwidth didn't change. Like I said, I look forward to Ian's deep dive. Realworld thread on Zen 3 release (not the current thread) should be pretty epic.

lobz · Oct 25, 2020

uzzi38 said:
https://twitter.com/x/status/1318369486580248576

Zen 3 on GB5.

Noticable points. Compared to the fastest 1185G7 on Windows here (unfortunately, there are no TGL-U benches with 5.1.1 like the 5900X here, so this will have to do for now):

Micro-Star International Co., Ltd. Please change product name - Geekbench

Benchmark results for a Micro-Star International Co., Ltd. Please change product name with a 11th Gen Intel Core i7-1185G7 processor.

browser.geekbench.com

The 5900X falls 5 points below in the averaged out single-threaded score whilst clocking between 4.775GHz and 4.95GHz. looking at the score breakdowns, the 5900X loses heavily in crypto, (2757 vs 4095), the two effecticely tie in the integer workloads (1409 vs 1405) and the 5900X takes a noticable lead in FP workloads (1837 vs 1640).

The 5950X run is using 5.2.3 but overall talking points from me remain the same for the most part. The 5950X loses some points by scoring 2707 in crypto, 1400 in integer and 1764 in floating point, but comparisons vs the 1185g7 otherwise remain the same. Heavy loss in crypto, virtually the same score in Integer with a lead in floating point.

Noticeable points: Geekbench is a HORRIFICLY TRASH benchmark, has been for years, I don't really care how unpopular am I going to be with this opinion of mine. They should team up with userbenchmark, since they are both consistent in being confidently representative of everything but real world performance.

amd6502 · Oct 25, 2020

Ajay said:
This tends to happen when the mis-predict rate goes down (fewer pipeline flushes). There are fewer stalls and threads must compete more competitively for resource use. Net throughput goes up, but the gains from SMT go down. Reduced memory/cache latency would also reduced thread stalls (mem waits). There could be other reasons, once Ian gets to do a deep dive on Zen3, we'll get a better idea.

The addition of an ALU might offset that effect. We may end up with a similar or possibly even higher SMT yield.

However, the unified L3 is going to work to the advantage of the single thread performance, thus having a negative effect on the SMT yield. For benchmarks that are not affected by a very large cache, the yield could still rise.

LightningZ71 · Oct 26, 2020

The unified L3 shouldn’t have a notable effect on the SMT gains percentage. Remember that SMT isn’t just about hiding latency due to thread stalls. It’s also about increasing core throughout by being able to dispatch micro ops from two threads simultaneously. While having a single thread stall due to a cache miss will free up execution resources for the second thread during the context switch of the first thread for another thread that is waiting for execution, that only really helps if the second thread actually needs those resources during that timeframe.

SMT likes wide cores. SMT throughout shouldn’t change a whole lot if the core hasn’t changed effective width. The only thing that MIGHT give somewhat of a hit to SMT performance is a deliberate retuning of the dispatch logic to favor the primary thread over the secondary thread. That’s a very general way of saying that tuning can be done to increase single thread throughout at the expense of multithreaded performance. Given AMD’s improvements to the processors as stated, they may have felt that it was worth the trade off given the competition.

DrMrLordX · Oct 26, 2020

lobz said:
Noticeable points: Geekbench is a HORRIFICLY TRASH benchmark, has been for years, I don't really care how unpopular am I going to be with this opinion of mine. They should team up with userbenchmark, since they are both consistent in being confidently representative of everything but real world performance.

Agreed.

Bigos · Oct 26, 2020

LightningZ71 said:
SMT likes wide cores. SMT throughout shouldn’t change a whole lot if the core hasn’t changed effective width. The only thing that MIGHT give somewhat of a hit to SMT performance is a deliberate retuning of the dispatch logic to favor the primary thread over the secondary thread. That’s a very general way of saying that tuning can be done to increase single thread throughout at the expense of multithreaded performance. Given AMD’s improvements to the processors as stated, they may have felt that it was worth the trade off given the competition.

There is no such things as "primary thread" and "secondary thread". Both threads in an SMT-enabled core are equal. When only one runs it takes almost all of the core resources (some might still be reserved for the other thread, depending on implementation). When both run their throughput is reduced, but if each can run at over 50% rate you will see a net performance gain. E.g. if each runs at 60% then you will see +20% SMT gain (this doesn't take into account scalability of the given workload).

As you already mentioned, the unified cache doesn't benefit multi-thread workloads when each thread accesses its own set of data, showing smaller MT gain than ST gain with Zen 3. We might also be seeing insufficient memory bandwidth being a limiter in some situations, though that would favor 5600X and 5800X regarding MT gain over 5900X+ and 5950X. The reviews should tell us.

jeanlain · Oct 26, 2020

lobz said:
Noticeable points: Geekbench is a HORRIFICLY TRASH benchmark, has been for years, I don't really care how unpopular am I going to be with this opinion of mine. They should team up with userbenchmark, since they are both consistent in being confidently representative of everything but real world performance.

Is geekbench trash or are tests simply not conducted in controlled conditions? Since anyone can post their results, you will find weird numbers in the database. Some launch geekbench while other tasks are running, for instance.
This analysis shows that geekbench integer results correlates extremely well with SPEC_INT (2006 and 2017) results. This is not a fully-fledge research paper, but this analysis is more thorough than most of what can be found online.
Of course, results from a particular app like blender may not reflect the results of synthetic benchmark tools, but does this indicate a problem with these tools? Since synthetic benchmark tools make averages from results obtained from an array of "real world" tasks, they cannot fully represent any particular scenario, but they are certainly more representative of overall CPU performance than any particular app.

coercitiv · Oct 26, 2020

jeanlain said:
This analysis shows that geekbench integer results correlates extremely well with SPEC_INT (2006 and 2017) results. This is not a fully-fledge research paper, but this analysis is more thorough than most of what can be found online.

The same analysis points out correlation between GB and SPEC is not an inherent property and can break under a number of scenarios. It essentially reinforces what many object when it comes to GB performance estimates across platforms - controlled testing is mandatory to ensure correlation between GB and established industry benchmarks. And yet "controlled testing" is essentially the opposite of what GB offers: testing for ALL!

The irony of all this is some supporters of the benchmark acknowledge this fault only to profit from it even more through selective result selection. We've seen this tactic in full bloom on the forums, and one may argue we can see it to a lesser degree in Nuvia marketing. Even if we were to give Nuvia the full benefit of the doubt, using an arguably less reliable benchmark to convey their message is a bad idea when the message itself is under intense scrutiny. If you want to claim absolute performance and perf/wattt supremacy with no working demo in hand, the least you can do is use estimates for industry standard benchmarks. Why reinvent the wheel to sell a new steed?

jeanlain · Oct 26, 2020

coercitiv said:
And yet "controlled testing" is essentially the opposite of what GB offers: testing for ALL!

That is true for any piece of software that can be obtained by anyone, even SPEC (provided you have the money).

coercitiv · Oct 26, 2020

jeanlain said:
That is true for any piece of software that can be obtained by anyone, even SPEC (provided you have the money).

You claimed that analysis proves corellation between GB and SPEC. The same analysis offers proof to the contrary.

jeanlain · Oct 26, 2020

coercitiv said:
You claimed that analysis proves corellation between GB and SPEC. The same analysis offers proof to the contrary.

How?
R^2 > 0.99 is extremely high.

jeanlain · Oct 26, 2020

coercitiv said:
If you want to claim absolute performance and perf/wattt supremacy with no working demo in hand, the least you can do is use estimates for industry standard benchmarks. Why reinvent the wheel to sell a new steed?

Using SPEC on mobile platforms (has Nuvia wanted to include Apple's and qualcomm's) is significantly harder.
But that's not the point. We're not discussing Nuvia's decision, we're discussing whether geekbench is intrinsically flawed. I haven't seen clear evidence that it is. Weird scores can results from poor testing procedures, and differences between workloads are expected. Geekbench is not supposed to reflect cinebench, neither is SPEC.

coercitiv · Oct 26, 2020

jeanlain said:
We're not discussing Nuvia's decision, we're discussing whether geekbench is intrinsically flawed. I haven't seen clear evidence that it is.

A minute ago you were willing to submit Nuvia's analysis as proof, now you place the burden of proof on the opposing side.

jeanlain said:
Using SPEC on mobile platforms (has Nuvia wanted to include Apple's and qualcomm's) is significantly harder.

They're trying to convince the world they're about to shake the entire computing industry. "Significantly harder" should be the norm for them.

jeanlain · Oct 26, 2020

coercitiv said:
A minute ago you were willing to submit Nuvia's analysis as proof, now you place the burden of proof on the opposing side.

I questioned the claim about geekbench being trash. Isn't my nor anyone's job to provide proof that geekbench is not trash. No one can, as it is impossible to exclude the existence of some defect, however small. IOW, geekbench being trash is not a workable null hypothesis that can/must be disproven. One should instead provide evidence contradicting the null hypothesis that geekbench is not trash. Individual geekbench results found on the web do not constitute convincing evidence, nor do contradictions with results from particular apps. At best, these may show that some tests are poorly conducted and that geekbench algorithms are not representative of a unique use case. Since primate labs never claimed that geekbench should correct for user errors or be representative of any particular workload, I'm not yet convinced that their tool is trash.

Nuvia's analysis and future products are not my main interest. I still find your claim of absence of correlation between the SPEC and geekbench scores outlandish. Again, can you clarify?

coercitiv · Oct 26, 2020

jeanlain said:
Nuvia's analysis and future products are not my main interest. I still find your claim of absence of correlation between the SPEC and geekbench scores outlandish. Again, can you clarify?

From the document itself:

While this observation is interesting from a benchmarking standpoint, Geekbench is generally less demanding of the micro-architecture than SPEC CPU is. For a subset of the micro-architectural features, Figure 3 shows the relative metric value for CPU2006 and CPU2017 normalized to a baseline of 1.0 for Geekbench 5. These were generated from detailed performance simulations of a modern CPU. It shows that the branch mispredicts and data cache (D-Cache), data TLB (D-TLB) misses are 1.1x — 2x higher in SPEC CPU compared to that seen in Geekbench 5. For this reason, chip architects tend to study a wide variety of benchmarks including SPEC CPU and Geekbench (among many others) to optimize the architecture for performance.

It is important to note that the observed correlation is not a fundamental property and can break under several scenarios.

One example is thermal effects. Geekbench typically runs quickly (in minutes) and especially so in our testing where the default workload gaps are removed, whereas SPEC CPU typically runs for hours. The net effect of this is that Geekbench 5 may achieve a higher average frequency because it is able to exploit the system’s thermal mass due to its short runtime. However SPEC CPU will be governed by the long term power dissipation capability of the system due to its long run-time. This is something to watch out for when applying such correlation techniques to systems that see significant thermal throttling or power-capping while running these benchmarks.

Another scenario where the correlation can break is non-linear jumps in performance that one benchmark suite sees but not the other. The interplay between the active data foot-print of a test and the CPU caches is a classic source of such non-linearities. For example, a future CPU’s cache may be large enough that many sub-tests of one benchmark suite may fully fit in cache boosting performance many fold. However, the other benchmark suite may not see such a benefit if none of its tests fit in cache. In such cases, the correlation will not hold.

coercitiv · Oct 26, 2020

APISAK pointed to another 5600X GB5 run.

Higher score, using faster RAM and more consistent boost of 4600+ Mhz.

jeanlain · Oct 26, 2020

@coercitiv, Nuvia's statement is directed at those who'd want to extrapolate from their results, as commonly done in a discussion section. It's good practice and I think they remain cautious, perhaps because they've been criticised for their previous blog post.
When they say that "the correlation will not hold", they certainly mean that it could be lower than what they observed. I don't see how to get an R^2 of zero in any realistic scenario.
If the CPU is downclocked due to overheating in SPEC and not in geekbench, then sure, the correlation will decrease. I don't see that as an issue, as I don't consider that geekbench should take overheating into account in its score.

richierich1212 · Oct 26, 2020

5800X CPU-Z 650 Single-thread & 6593 Multi-thread scores

Hitman928 · Oct 27, 2020

richierich1212 said:
5800X CPU-Z 650 Single-thread & 6593 Multi-thread scores

Looks like a static overclock of ~4.5 GHz, am I seeing that correct?

RTX2080 · Oct 27, 2020

5950x CPUZ scores ~690 ST and ~13300 MT, it seems that PBO is on.

itsmydamnation · Oct 27, 2020

richierich1212 said:
5800X CPU-Z 650 Single-thread & 6593 Multi-thread scores

What is that horrible memory /if clock... Yuk

lobz · Oct 27, 2020

itsmydamnation said:
What is that horrible memory /if clock... Yuk

That's your average wannabe influencer, errr....I mean your average tech tuber in action right there.

lightmanek · Oct 27, 2020

itsmydamnation said:
What is that horrible memory /if clock... Yuk

They are keeping true performance numbers till product lands in our hands. With collective knowledge between true hardware enthusiasts we will take these scores to Over 9000!?!

Speculation: Ryzen 4000 series/Zen 3

Member

Diamond Member

Lifer

Platinum Member

Senior member

Platinum Member

Lifer

Senior member

Member

Diamond Member

Member

Diamond Member

Member

Member

Diamond Member

Member

Diamond Member

Diamond Member

Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Senior member