- Mar 3, 2017
- 1,749
- 6,614
- 136
The only notable improvement that AMD can do for gaming with the 9xxx x3d parts as compared to the 7xxx implementation is find a way to reduce the slightly increased access latency for L3 that the X3d parts have. Otherwise, it's going to show similar gains in similar areas.
They could do a 4004 dual vcache processor that could be useful in certain use cases, but it wouldn't be aimed at gaming.
ATM it does not seem like the supply side is the problem for AMD and Zen5Great but dude, you just tempting game companies to take away the X3D supply from us poor little people!
Dies which fail as the 8C/16T die but are able to run as 6C should be very rare.
That is easier said than done, as X3D increases cache size dramatically, and that itself increases latency, just because of increased propagation latency. Secondly, if you want to make the cache faster, it might be a limiter for the entire chip regarding max clocks. So while you might have a faster chip per clock, they might have to reduce clocks to do so.I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
Pretty sure this latency difference is pretty much down to clockspeed alone, atleast a large part of itView attachment 108417
I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
IIRC the additional cache die suffers from 3 or 4 additional latency cycles but that is applicable once you exhaust original L3.Pretty sure this latency difference is pretty much down to clockspeed alone, atleast a large part of it
(L3 runs at core clock)
13.84ns? ))
I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
Besides purely architectural aspects, 5800X3D L3 latency decrease with higher clock (not sure about Zen4 X3D though) is limited by a fixed L3 EDC limit (unlike the regular Zen3 where it increases linearly with the VddCpu EDC limit)The absolute latency difference (in ns) will be down to clockspeed difference as you correctly identified.
IIRC the additional cache die suffers from 3 or 4 additional latency cycles but that is applicable once you exhaust original L3.
True, I stand corrected, I have remembered wrongly an article I have read, the 4 cycle penalty is uniform to Vcache enabled CCD.Not quite. All cores in the CCD use all of the cache evenly. That is, cores don't first use the original L3, but all accesses are spread out evenly over the full 96MiB.
The cache is implemented as slices. Any cache line that gets stored in L3 is hashed to a specific slice based on it's physical address, this way the cpu only needs to look for matches in a single slice. Adding the vcache adds more slices to the cache.
I doubt they'll manage to clock this as high as the 9700X as there are physical limits with the current packaging. And two layers of silicon just don't dissipate heat that wellThe real question is, will the "designed for increased frequencies" mean +100 or +200 MHz. No way in hell it will match the 5.5GHz of 9700X. This is AMD we are talking about. They will find a way to disappoint.
It's not a heat issue (MI300X can have far, far toastier hotspots in PHY-heavy loads).And two layers of silicon just don't dissipate heat that well
Yeah it has a better v/f and they generally dialed the Vmax back so V$ parts should have way less of a delta now.Zen5 cores only need ~1150mv get for around 5500mhz allcore in Cinebench R23 (around 48k points for the 9950X)
Aha!! How can you possibly you know this?And just like with Zen4, the X3D parts have a much better V/F curve than vanilla
How can i share all these screenshots on release day of Zen5 ? 🙃Aha!! How can you possibly you know this?
Perhaps I should be more frank - if you have it in hand, throw us some more bones my dude!How can i share all these screenshots on release day of Zen5 ? 🙃
And be sure i have played enough with the 7800X3D and 7950X3D v-cache models to know their V/F curve compared to regular chips
Are both your ccx's equally good? 😯 And do you by chance know what clocks you can roughly expect from an average retail 9950x? (@1200mv)This is a highly binned 9950X @ same 1200mv set static = 5500/5500 max allcore clocks
Thats why i get my #6th retail 9950X sample tomorrow, trying to find silicon that can match my ES sample.Are both your ccx's equally good? 😯 And do you by chance know what clocks you can roughly expect from an average retail 9950x? (@1200mv)
Looks like the ES is quite strong. I have had the same experience with Server CPUs. The ES chips (especially late samples) are often stronger than "retail" general since they are going to be smaller batches and the best bins.Thats why i get my #6th retail 9950X sample tomorrow, trying to find silicon that can match my ES sample.
Need a good retail sample so futuremark can stop hiding my results because ES --> "Processor is not recognized"
View attachment 1085283dmark.com
www.3dmark.com
I haven't updated "chart" with my soon be #6th yet, but this is my 3 first retail samples
- The goal was to complete 3x MT runs in a row in Cinebench R23
- Static cpu voltage was set at 1.200 vcore --> around 1.156 vcore get while running (since set in windows only hwinfo motherboard vcore is correct reading)
- Static clockspeeds was upped on each individual CCDs in 25mhz intervals before it could not pass 3x runs back to back
- I tried to keep the watercooled same loop GPU idle temp as close to 20 degrees as possible for all runs (most reliable for me)
- Non of the runs were done at stock 5600MT/s speeds, but not maxed out either --> was just what i was running at the time of testing
- Nevermind the R23 MT score themself as i had lots of stuff running, lowering the scores. Point of interest was clockspeeds
- HWinfo got reset before each run, all runs seems to be about +- ~260w powerusage while running Cinebench R23 MT
- Stupid windows11 printscreen would only take screenshot the same millisecond the benchmark ended on all but the ES, but values in hwinfo hadn't had time to update.
Retail #1 9950X sample
Max stable FCLK = 2167mhz
Max static R23 ST clockspeed @ set 1.4core = 5825mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5450mhz CCD1
View attachment 108523
Retail #2 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5550mhz CCD0 / 5425mhz CCD1
View attachment 108524
Retail #3 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5425mhz CCD1
View attachment 108525
ES 9950X delidded sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5900mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5500mhz CCD1
View attachment 108527
Nice, so the article contains quotes that finally identify some workloads where the ccd latency could make a difference:Apologies for re-posting if this was already discussed here: https://www.techpowerup.com/327122/...700x-and-9600x-intercore-latency-improvements
Interesting that AMD went to the trouble of creating slides for this update. Seems we'll unfortunately have to wait for Arrow Lake reviews to get proper Zen 5 performance overview, complete with X870E+EXPO DDR5-8000 kits.
Unfortunately, it does not say anything about which workloads AMD tuned for that lead for this algorithm to ship in the first place and according to Mystical it was giving better results for the workloads AMD cared about. Going by the description though:While this will show up on some core-to-core latency benchmarks, the real-world improvement is most noticeable in a very specific gaming scenario: in heavily threaded games that don't trigger core parking. Our lab tests suggest Metro, Starfield, and Borderlands 3 can show some uplift, as well as synthetic tests like 3DMark Time Spy.
It sounds to be a general improvement. Meaning it shouldn't hurt anything, oh well.This was mainly due to some corner cases where it takes two transactions to both read, and write, when information is shared across cores on different parts of a Ryzen 9 9000 series processor. However, we've been working on optimizing this since the launch of the 9000 series. In the new 1.2.0.2 BIOS update, we've managed to cut the number of transactions in half for this use case, which helps reduce core-to-core latency in multi-CCD models.
This thread is so dead, not even some X3D rumors posted hours ago have made it here yet.
AMD Ryzen 9000X3D "Rumored" Performance Figures Reveal Faster Multi-Threaded & Slightly Slower Single-Threaded Numbers Versus Non-X3D CPUs
AMD's Ryzen 9000X3D CPUs are coming and it looks like they will boast faster multi-core performance than the non-X3D chips.wccftech.com