Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 826 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Makaveli

Diamond Member
Feb 8, 2002
4,857
1,395
136
The only notable improvement that AMD can do for gaming with the 9xxx x3d parts as compared to the 7xxx implementation is find a way to reduce the slightly increased access latency for L3 that the X3d parts have. Otherwise, it's going to show similar gains in similar areas.

They could do a 4004 dual vcache processor that could be useful in certain use cases, but it wouldn't be aimed at gaming.

1727624517465.png

I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
 

Mopetar

Diamond Member
Jan 31, 2011
8,146
6,846
136
Dies which fail as the 8C/16T die but are able to run as 6C should be very rare.

I would think that AMD sets the bin targets based on whatever values result in the needed number of parts based on market demand.

In other words if only 10% of the chiplets would wind up in 6C parts for some voltage/clockspeed levels but they would need more than that to meet market demand they would just adjust the voltage/clock targets until they get whatever % to fall into that bin.
 

DavidC1

Golden Member
Dec 29, 2023
1,180
1,879
96
I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
That is easier said than done, as X3D increases cache size dramatically, and that itself increases latency, just because of increased propagation latency. Secondly, if you want to make the cache faster, it might be a limiter for the entire chip regarding max clocks. So while you might have a faster chip per clock, they might have to reduce clocks to do so.
 

MS_AT

Senior member
Jul 15, 2024
356
786
96
Pretty sure this latency difference is pretty much down to clockspeed alone, atleast a large part of it
(L3 runs at core clock)
IIRC the additional cache die suffers from 3 or 4 additional latency cycles but that is applicable once you exhaust original L3.

The absolute latency difference (in ns) will be down to clockspeed difference as you correctly identified.
 

PJVol

Senior member
May 25, 2020
728
683
136
1727624517465.png


I agree if they can bring the L3 latency down on X3D parts and remove the clock speed drop that should bring decent gains.
13.84ns? ))
Is this the same well-known bonus for using Windows OS Beta-testers Edition (sometimes referred to as Windows 11) ?

The absolute latency difference (in ns) will be down to clockspeed difference as you correctly identified.
Besides purely architectural aspects, 5800X3D L3 latency decrease with higher clock (not sure about Zen4 X3D though) is limited by a fixed L3 EDC limit (unlike the regular Zen3 where it increases linearly with the VddCpu EDC limit)
 

Attachments

  • cachemem.png
    cachemem.png
    260 KB · Views: 52
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,505
2,079
136
IIRC the additional cache die suffers from 3 or 4 additional latency cycles but that is applicable once you exhaust original L3.

Not quite. All cores in the CCD use all of the cache evenly. That is, cores don't first use the original L3, but all accesses are spread out evenly over the full 96MiB.

The cache is implemented as slices. Any cache line that gets stored in L3 is hashed to a specific slice based on its physical address, this way the cpu only needs to look for matches in a single slice. Adding the vcache adds more slices to the cache.
 
Last edited:

MS_AT

Senior member
Jul 15, 2024
356
786
96
Not quite. All cores in the CCD use all of the cache evenly. That is, cores don't first use the original L3, but all accesses are spread out evenly over the full 96MiB.

The cache is implemented as slices. Any cache line that gets stored in L3 is hashed to a specific slice based on it's physical address, this way the cpu only needs to look for matches in a single slice. Adding the vcache adds more slices to the cache.
True, I stand corrected, I have remembered wrongly an article I have read, the 4 cycle penalty is uniform to Vcache enabled CCD.
 

Gideon

Golden Member
Nov 27, 2007
1,830
4,343
136
The real question is, will the "designed for increased frequencies" mean +100 or +200 MHz. No way in hell it will match the 5.5GHz of 9700X. This is AMD we are talking about. They will find a way to disappoint.
I doubt they'll manage to clock this as high as the 9700X as there are physical limits with the current packaging. And two layers of silicon just don't dissipate heat that well

While even a 200 Mhz boost would help, I really hope they can somehow reach 5.3 - 5.4 GHz. The latter is almost 8%, and while it won't set the world on fire, it would be a much better generational increase that the vanilla models.

If they manage to raise clocks by that much, the dual-CCD X3D chips also become much more relevant for some MT tasks.
 

Det0x

Golden Member
Sep 11, 2014
1,345
4,539
136
Aha!! How can you possibly you know this? :wink:
How can i share all these screenshots on release day of Zen5 ? 🙃
And be sure i have played enough with the 7800X3D and 7950X3D v-cache models to know my way around their V/F curve and how they compared to regular Zen4 chips
You talked about 5.5ghz earlier.. I have actually done 5.8ghz on a 7800x3d already

Z4X3D is vcore and temp limited (if you dont know secret tricks to work-around)
Vcore limit is kinda der8auer fault as AMD realised they didnt want scrubs killing their cpus
But even if there were no vcore limit, the Zen4 core are reaching lower clockspeeds compared to the Zen5 core, at the same set voltage (power usage is higher on Z5 tho)

This is highly binned 7950X @ 1200mv set static = 5350/5400 max allcore clocks
1727818806903.png

This is a highly binned 9950X @ same 1200mv set static = 5500/5500 max allcore clocks
1727818928775.png

And i have said too much about the other limit already....
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,345
4,539
136
Are both your ccx's equally good? 😯 And do you by chance know what clocks you can roughly expect from an average retail 9950x? (@1200mv)
Thats why i get my #6th retail 9950X sample tomorrow, trying to find silicon that can match my ES sample.
Need a good retail sample so futuremark can stop hiding my results because ES --> "Processor is not recognized"

1727843218416.png

I haven't updated "chart" with my soon be #6th yet, but this is my 3 first retail samples
  • The goal was to complete 3x MT runs in a row in Cinebench R23
  • Static cpu voltage was set at 1.200 vcore --> around 1.156 vcore get while running (since set in windows only hwinfo motherboard vcore is correct reading)
  • Static clockspeeds was upped on each individual CCDs in 25mhz intervals before it could not pass 3x runs back to back
  • I tried to keep the watercooled same loop GPU idle temp as close to 20 degrees as possible for all runs (most reliable for me)
  • Non of the runs were done at stock 5600MT/s speeds, but not maxed out either --> was just what i was running at the time of testing
  • Nevermind the R23 MT score themself as i had lots of stuff running, lowering the scores. Point of interest was clockspeeds
  • HWinfo got reset before each run, all runs seems to be about +- ~260w powerusage while running Cinebench R23 MT
  • Stupid windows11 printscreen would only take screenshot the same millisecond the benchmark ended on all but the ES, but values in hwinfo hadn't had time to update.

Retail #1 9950X sample
Max stable FCLK = 2167mhz
Max static R23 ST clockspeed @ set 1.4core = 5825mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5450mhz CCD1
1727842582585.png

Retail #2 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5550mhz CCD0 / 5425mhz CCD1
1727842614369.png

Retail #3 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5425mhz CCD1
1727842646534.png

ES 9950X delidded sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5900mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5500mhz CCD1
1727843063093.png
 
Last edited:

Jimster480

Junior Member
Aug 30, 2024
1
0
6
Thats why i get my #6th retail 9950X sample tomorrow, trying to find silicon that can match my ES sample.
Need a good retail sample so futuremark can stop hiding my results because ES --> "Processor is not recognized"

View attachment 108528

I haven't updated "chart" with my soon be #6th yet, but this is my 3 first retail samples
  • The goal was to complete 3x MT runs in a row in Cinebench R23
  • Static cpu voltage was set at 1.200 vcore --> around 1.156 vcore get while running (since set in windows only hwinfo motherboard vcore is correct reading)
  • Static clockspeeds was upped on each individual CCDs in 25mhz intervals before it could not pass 3x runs back to back
  • I tried to keep the watercooled same loop GPU idle temp as close to 20 degrees as possible for all runs (most reliable for me)
  • Non of the runs were done at stock 5600MT/s speeds, but not maxed out either --> was just what i was running at the time of testing
  • Nevermind the R23 MT score themself as i had lots of stuff running, lowering the scores. Point of interest was clockspeeds
  • HWinfo got reset before each run, all runs seems to be about +- ~260w powerusage while running Cinebench R23 MT
  • Stupid windows11 printscreen would only take screenshot the same millisecond the benchmark ended on all but the ES, but values in hwinfo hadn't had time to update.

Retail #1 9950X sample
Max stable FCLK = 2167mhz
Max static R23 ST clockspeed @ set 1.4core = 5825mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5450mhz CCD1
View attachment 108523

Retail #2 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5550mhz CCD0 / 5425mhz CCD1
View attachment 108524

Retail #3 9950X sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5850mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5425mhz CCD1
View attachment 108525

ES 9950X delidded sample
Max stable FCLK = 2200mhz
Max static R23 ST clockspeed @ set 1.4core = 5900mhz
Max completed MT clockspeeds = 5500mhz CCD0 / 5500mhz CCD1
View attachment 108527
Looks like the ES is quite strong. I have had the same experience with Server CPUs. The ES chips (especially late samples) are often stronger than "retail" general since they are going to be smaller batches and the best bins.
 

MS_AT

Senior member
Jul 15, 2024
356
786
96
Apologies for re-posting if this was already discussed here: https://www.techpowerup.com/327122/...700x-and-9600x-intercore-latency-improvements

Interesting that AMD went to the trouble of creating slides for this update. Seems we'll unfortunately have to wait for Arrow Lake reviews to get proper Zen 5 performance overview, complete with X870E+EXPO DDR5-8000 kits.
Nice, so the article contains quotes that finally identify some workloads where the ccd latency could make a difference:
While this will show up on some core-to-core latency benchmarks, the real-world improvement is most noticeable in a very specific gaming scenario: in heavily threaded games that don't trigger core parking. Our lab tests suggest Metro, Starfield, and Borderlands 3 can show some uplift, as well as synthetic tests like 3DMark Time Spy.
Unfortunately, it does not say anything about which workloads AMD tuned for that lead for this algorithm to ship in the first place and according to Mystical it was giving better results for the workloads AMD cared about. Going by the description though:
This was mainly due to some corner cases where it takes two transactions to both read, and write, when information is shared across cores on different parts of a Ryzen 9 9000 series processor. However, we've been working on optimizing this since the launch of the 9000 series. In the new 1.2.0.2 BIOS update, we've managed to cut the number of transactions in half for this use case, which helps reduce core-to-core latency in multi-CCD models.
It sounds to be a general improvement. Meaning it shouldn't hurt anything, oh well.
 

Josh128

Senior member
Oct 14, 2022
510
862
106
This thread is so dead, not even some X3D rumors posted hours ago have made it here yet.

TLDR: 9800X3D R23 ST 2145, MT 23315
.......... 9950X3D: ............2245,...... 42375

Indicates ST boost clock speed of ~5.3GHz for 9800X3D, ~5.6GHz for 9950X3D. Very impressive for the 9950 if that score was achieved with an X3D die.


 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
8,146
6,846
136
This thread is so dead, not even some X3D rumors posted hours ago have made it here yet.


It's a wccf article so there's always a chance it's sourcing a comment someone posted here days ago. :p

Nothing really new or surprising though and missing anything such as clock speeds that people would actually want to know.

There, I saved everyone else the trouble of reading it.