- Mar 3, 2017
- 1,747
- 6,598
- 136
I found some leaks of Arrow Lake 5 series CPU geekbench results. Zen 5 beat Intel Arrow Lake significantly in single thread and got smoked in multi thread results. I can almost guarantee you 100% that AMD's problem is in their AGESA bios. I will not even hazard a guess as to what the real Zen 5 performance should/will be when they sort out their mess.AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.
More and more it looks like desktop Zen 5 should have just been delayed, even if it would be a 6 month plus delay, to get this and other performance anomalies ironed out. I cant wait to see the core latencies on Zen 5C 3nm Turin, which is rumored to have the fabled 16 core CCX.
While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2AMD really needs to fix their DDR5 bandwidth limits. They should have upgrade their infinity fabric controller with Zen 4 and certainly by Zen 5. That is holding back significant performance gains faster ram would provide.
Wouldn't AMD have caught this in their internal testing ?While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2
From my understanding Zen 4 CPU's are only stable up to DDR5 6000mhz in some cases 6400mhz. With Zen 5 it's more of the same. Newegg has memory kits that go up to 8400mhz. I am adding the DDR5 memory issues with Ryzen to the Zen 5 problem. I figured they would have solved that problem. Lisa said that Zen 4 was going to be a memory OCer's dream. They said the sweet spot was 6400mhz. All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.While I completely agree, the issue here isn't fabric or IMC, it's something that happens when one CCD accesses the nonlocal L3 cache of the second CCD. As Ryan noted, it's slower than memory access and on par with typical access time of non-local memory of a 2S/4S/8S system, which should never happen considering this is the same IO die as AMD had with Zen4 (and as we see, this 'far cache' access time was roughly the same for all CPUs since Zen2
Zen 5 cannot even support DDR5 7000mhz or better. There should be no limitation on memory speeds in a Zen 5 system or a Zen 4 system. The sky should be the limit.
First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.Maybe Linux slants more towards the performance side of power management (more aggressive with clock ramping, which would help Geekbench) and Windows slants more towards saving power.
Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.
Ryzen 9 9950X & 9900X: Gaming-Benchmarks
Dieser Test analysiert die Gaming-Performance von Ryzen 9 9950X sowie Ryzen 9 9900X inklusive Benchmarks mit RAM-OC bis DDR5-8000.www.computerbase.de
Not all, and not really. Also, you can't really compare 1:1 amd mode to anything intel does, as Intel always has 1:2 mode enabled. And in 1:2 mode, it's not hard to get to 7800-8000, sometimes even with 2DPC board. In any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.All the ryzen builders are focusing on DDR5 6000mhz sticks with Cas 30 timings or better.
Well, as I've noted earlier, this might have been some sort of conscious tradeoff made by AMD, maybe related to the fact that IO die for next gen Epyc is new and not the same as with soho Zen5.Wouldn't AMD have caught this in their internal testing ?
For the time there s no memory issues since max RAM frequency has been extended significantly, the only issue so far is the inter CCDs latency that is unexpected and still not explicated.Hopefully all of the current Zen 5 issues and the DDR5 memory issues will be worked out in the upcoming AGESA bios updates.
If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.
Actually, somebody with Zen5 2ccd chip could try to vary RAM timings keeping IF frequency constant, and run core to core latency test. If the curve wouldn't be flat that would mean CCDs are synced through RAM, what would be surprising to say the leastIn any case, it's not relevant to this interchiplet communication issue, slower or faster RAM shouldn't affect this at all, as far as I can understand.
It could also be the scheduler never giving a single process maximum cycles available, so that QoS is better for multitasking/GUI responsiveness.First of all this is highly configurable, both under Windows and under Linux. But there well may be a slant in that while the Windows scheduler is about the worst one out there, energy saving features are usually supported earlier under Windows than under Linux. But I doubt Ryzen 9000's performance oddities can be ascribed to (correctly working) energy saving features.
If benchmarks were to include joules used one could see if power usage differs along the performance, or whether something else must be amiss.
Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍In their Zen 5 specific review for games Computerbase use up to 8000MT/s RAM and they say that it run smoothly at this frequency, and that s with a 2 x 24GB kit, even the 7950X3D get up to 7200.
Ryzen 9 9950X & 9900X: Gaming-Benchmarks
Dieser Test analysiert die Gaming-Performance von Ryzen 9 9950X sowie Ryzen 9 9900X inklusive Benchmarks mit RAM-OC bis DDR5-8000.www.computerbase.de
No. LoL
That s the timings he apparently used :Seems like mainstream reviewers have finally started to run higherspeed that "we overclockers" have ran for like 1.5 years already 👍
Just too bad its slow expo timings atm
DDR5-8000 und den Timings 38-48-48-98
tFRC+tREFI is the timing that will give the highest performance gain, sub-timings matter also! ;-)Why, are the 7000 series manufactured and sold by Intel.?.
That s the timings he apparently used :
He also said that the 9600X/9700X/9950X worked flawlessly but not the 9900X, also he had no time to tweak the timings since he was hard pressed by the short time available before releasing the review, hence he used only out of the box timings and added that the 9900X could eventually work as well with some tweaks.
I'd say that RFC, REFI, RRDL/RRDS/FAW and SCLs are more important for gaming performance than primariesThat s the timings he apparently used :
tFRC is the timing that will give the highest performance gain, sub-timings matter also! ;-)
I'd say that RFC, REFI, RRDL/RRDS/FAW and SCLs are more important for gaming performance than primaries
actually by the numbers you posted there its zen 13%This is the ARL preliminary benchmarks, from Jaykihn at Xwitter.
https://x.com/jaykihn0
As a comparison, I pulled Uniko's testing of the recent 9950X.
Looking at GB 5.4.5, the QS scored 2455, which trails 9950X by around 5%. Multicore however the QS is winning 9950X by almost 8%, but also keep in mind that GB doesn't scale well with more cores (correct me if I'm wrong). It's impressive if we consider the fact the score is obtained without HT, but for people looking for a generational improvement, this is exactly Zen 5% lol.
Yeah its kinda strange i got 2 months to play with my sample before release, and reviewers got ~1week 🤔If the reviewers keep the chips for some time we ll surely have revised reviews for the tests with ocked RAM, the current reviews were rushed since at Computerbase the CPU perfs reviewer, Volker Rißka, said on their forum that he got only 3 days to make all the perfs measurements, guess that it was no different for other outlets.
Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers to make good decisions.)From what we have so far, the driver does not do anything for parking the cores, that behavior is controlled by the CPU/firmware itself. The driver is there to prevent thread migration from the active CCD to the inactive CCD unnecessarily.
Although maybe it could if one wanted to(?).The 7950x wouldn't park cores on the 2nd CCD.
First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.What I would like is AMD to come out and explain how they got the application Performance numbers they showed in the charts (which is not reproducible by anyone else right now). There is no way they achieved those numbers and everyone else is doing something wrong.
One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.AMDs response to this inquiry will be very telling. I doubt they admit what is really going on. Is it a regression that was necessary due to architectural design choices, a result of halting design at a specific point to meet an internal launch date goal, or is it a silicon level bug that might or might not be fixable by new stepping or microcode.
Likely they have; the question is what they determined is the impact on real workloads of this.Wouldn't AMD have caught this in their internal testing ?
So what you are saying in a nutshell. A lot of people are getting fired at AMD for Zen 5 in September.Isn't "core parking" an interaction between Windows process scheduler, power management driver, and CPU? (That is, the process scheduler avoids putting tasks onto a subset of logical CPUs, idle CPUs go into low power state. The latter was the whole reason why "core parking" was invented back in the day; now it its used/ abused for quite different purposes, as processors have become increasingly complex, making it ever more difficult for OSs' process schedulers and power managers make good decisions.)
Although maybe it could if one wanted to(?).
Benefits of spreading a lowly…medium parallel workload onto both CCXs of dual-CCX Ryzens:
Benefits of concentrating a lowly…medium parallel workload onto one of the CCXs of dual-CCX Ryzens:
- More cache is available.
- More GMI link width is available.
The latter is important in such cases as — for example — increasing lo-res/low-detail video game FPS from 450 to 600, or to efficiently run multithreaded FFTs to search for large primes.
- Threads which happen to interact share cache lines.
First of all, it was borderline impossible to reproduce any of this for launch day reviews: In the tight time frame from sample reception until end of the review embargo, reviewers must have had their hands full with going through multiple BIOS updates and Windows reinstallations, perhaps going through questions and answers with AMD or other reviewers because they have a hard time to make things work, and getting as much of their own planned set of benchmarks as possible done and written up.
However, now that the rush to publish is over, somebody could take the sparse details which are given in the infamous endnotes GNR-01…GNR-04, tell themselves "imagine you had to produce some bar charts for a presentation which your very CEO is going to give at Computex", and then see if they could achieve what AMD subtitled with "* all results are 'up to'". There is certainly quite some room to optimize here and un-optimize there; it's mostly a question of whether you are more afraid of getting fired in May because the numbers aren't looking good enough, or of getting fired in August because 3rd parties won't match these numbers on different setups.
One thing which they appear to have done in Zen 5 relative to Zen 3/4, according to Granite Ridge die shots, is to implement the L3 cache within a considerably reduced area. Right now I can't see though how this could affect penalties to cross-CCX traffic.
The other obvious change in Zen 5 is that they co-designed for 8c and 16c CCXs. But again it is not obvious to me how this pertains to traffic outside a CCX.
Third, Turin has got a new IOD, which notably has to support a) more GMI links than Genoa's IOD, b) up to 16 rather than 8 cores per CCX. Maybe this had repercussions on what they did with the CCD's GMI links or on how L3 tags are managed, all the while as they insisted to not update the client IOD alongside.
Likely they have; the question is what they determined is the impact on real workloads of this.