9950x / Zen 5 thread

Icecold

Golden Member
Nov 15, 2004
1,099
1,027
146
I didn't see a thread on this in this subforum yet, but I may have missed it. I picked up a 9950x yesterday and just did a benchmark on Primegrid Seventeen or Bust on a specific WU to see how it compares to a 7950x. Both machines were running Linux. The 9950x is my machine, the 7950x is not. Here are the results -
Code:
Summary for AMD Ryzen 9 7950X 16-Core Processor, test cutoff: 10m
candidate          |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------------+--------------+---------------------------+-----------------------+-----------+-----------
55459*2^42478498+1 |   129,192.52 | 2x8, ascending            |  14:54:23 =   53663 s |     3.220 |    415,999

With the same WU on a 9950x-

Code:
Summary for AMD Ryzen 9 9950X 16-Core Processor AMD Ryzen 9 9950X 16-Core Processor Unknown CPU, test cutoff: 10m
candidate          |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------------+--------------+---------------------------+-----------------------+-----------+-----------
55459*2^42478498+1 |   129,192.52 | 2x8, ascending            |  10:34:00 =   38040 s |     4.542 |    586,792

Around a 40% improvement on this subproject, likely almost entirely due to AVX 512 improvements between Zen4 and Zen5.

I've also ran PPS and PPSE with identical settings to my 7900x (2 thread tasks, affinity set, no SMT both on Windows) and the task times can be seen at the Primegrid host pages. 7900x - https://www.primegrid.com/results.php?hostid=1341950 and 9950x - https://www.primegrid.com/results.php?hostid=1351653

If there's any specific tests or benchmarks or projects ran or anything anybody wants done let me know and I can try to do so. I'm currently running latest Linux Mint on the machine.
 

Icecold

Golden Member
Nov 15, 2004
1,099
1,027
146
Ran some tests on ODLK. Based on an average of time/credit across 15-20 validated tasks per machine my 5950x will do 39,855 points (1,245 per thread) per day, my 7900x will do 39,838 points per day (1,660 per thread) and the 9950x will do 48,638 points (1,520 per thread) per day. Interesting that on a per thread/core basis the 7900x is actually slightly higher than the 9950x. My assumption being it's because it seems to have a higher clock speed under load than the 9950x does. I'm not sure if ODLK uses any form of AVX let alone AVX2 or AVX 512.

The 5950x and 9950x are on Linux, the 7900x is on Windows. This is all with SMT on.
 
  • Like
Reactions: StefanR5R

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
Not sure how close an all-threads (or almost-all-threads) ODLK load gets to the powerlimit of a top-end Ryzen. But it might come quite close. Which translates to that the 7900X is able to spend quite a bit more power per thread than the 9950X. (Unless one would change powerlimits such that per-core power is matched.) On top of that, amount of L3 cache per core, Infinity Fabric bandwidth per core between CCDs and IOD, and potentially memory bandwidth per core is more favorable to the 7900X, but these would not be my first guesses as contributors to more ODLK/thread performance.

The old Broadwell-EP runs ODLK at its generic turbo clock speed. That is, if there is any vector arithmetic in ODLK at all, it does not fulfill the criteria to trigger Broadwell-EP's AVX2 turbo clock offset.

PS, ODLK on 88-threaded Broadwell-EP: 60,000 PDD (680 PPD/thread) [@380 W -> 160 PPD/W]
 
Last edited:
  • Like
Reactions: Icecold

Icecold

Golden Member
Nov 15, 2004
1,099
1,027
146
It would be a lot easier to provide an apples to apples comparison if I had a 7950x, but unfortunately when I got the 7900x it was part of a motherboard/RAM/CPU combo deal that the 7950x wasn't part of (at Microcenter, a store here). ODLK does seem to perform very well on older CPU's, I've ran it quite a bit on my Haswell Xeons and was happy with the performance. It runs great on Zen2 as well.

I'm running some NFS@home on the 9950x now and will post back with results. If there's any other tests or benchmarks you'd want ran on it just let me know.

The cooling is a Noctua NH-D15 Chromax black that I pulled off the 3900x this replaced. It seems more than sufficient, even on AVX 512 workloads the temps are not bad and it hits the PPT limit before any thermal throttling.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
SoB 55459*2^42478498+1 on Broadwell-EP (dual-socket E5-2696 v4),
best is 2x22: 366,000 PPD [@460 W -> 800 PPD/W]

Does not look so well compared to the 416,000 and 587,000 PPD from post #1.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
LLR2 on Zen 4, both desktop and server, was observed to have slightly better throughput at same or slightly better power efficiency when using all SMT threads, as opposed to only 1 thread/core. Hence, a repeat of the SoB test with CPU_CONFIGS=('2 16 0-7,16-23 8-15,24-31') would be interesting to me, especially if the 7950X would be tested this way too.
 

Icecold

Golden Member
Nov 15, 2004
1,099
1,027
146
LLR2 on Zen 4, both desktop and server, was observed to have slightly better throughput at same or slightly better power efficiency when using all SMT threads, as opposed to only 1 thread/core. Hence, a repeat of the SoB test with CPU_CONFIGS=('2 16 0-7,16-23 8-15,24-31') would be interesting to me, especially if the 7950X would be tested this way too.
I'm still working on getting 7950x results, but the 9950x seemed to perform quite a bit worse with those settings-

Code:
 Summary for AMD Ryzen 9 9950X 16-Core Processor AMD Ryzen 9 9950X 16-Core Processor Unknown CPU, test cutoff: 10m
candidate          |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------------+--------------+---------------------------+-----------------------+-----------+-----------
55459*2^42478498+1 |   129,192.52 | 2x16, 0-7,16-23 8-15,24-~ |  12:27:24 =   44844 s |     3.853 |    497,778

Ran it again to make sure it wasn't a one off, a little better but still slower than the 2x8 benchmark.

Code:
Summary for AMD Ryzen 9 9950X 16-Core Processor AMD Ryzen 9 9950X 16-Core Processor Unknown CPU, test cutoff: 10m
candidate          |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------------+--------------+---------------------------+-----------------------+-----------+-----------
55459*2^42478498+1 |   129,192.52 | 2x16, 0-7,16-23 8-15,24-~ |  12:07:23 =   43643 s |     3.959 |    511,473

Edit - 7950x results
Code:
Summary for AMD Ryzen 9 7950X 16-Core Processor, test cutoff: 10m
candidate          |    credit    | tasks x threads, affinity |     task duration     | tasks/day | points/day
-------------------+--------------+---------------------------+-----------------------+-----------+-----------
55459*2^42478498+1 |   129,192.52 | 2x16, 0-7,16-23 8-15,24-~ |  15:54:41 =   57281 s |     3.016 |    389,644
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
Thanks! So for this workunit at least, I sent you up a blind alley with my 2x16 suggestion. I'll keep that in mind, maybe I find out more about the x8 <--> x16 thing on Zen 4 at some other time.

This SoB workunit wants 32 MB cache for the FMA3 implementation, perhaps for the AVX-512 implementation too (plus the cache for what's going on alongside the pure FFT algorithm). Which could lead to a possible explanation why the Zen 4 SMT scaling which I claimed to exist is not showing here: One 7950X/9950X CCX has got 8x1 MB L2 cache (private to cores) and 1x32 MB L3 cache (shared, mostly used to store lines which were evicted from L2 cache). With this large SoB WU, there is probably already some spill-over from L3$ to RAM going on. If so, letting 16 instead of 8 threads do that would make matters only worse.

Anyhow, although it turned out the other way than I expected, the relative performances might still be telling something:
  • 7950X: 2x8 perf / 2x16 perf = 416 kPPD / 390 kPPD = 1.00 / 0.94
    (that is, 6% loss if SMT is used for SoB)
  • 9950X: 2x8 perf / 2x16 perf = 587 kPPD / 505 kPPD = 1.00 / 0.86
    (14% loss if SMT is used for SoB)

Some possible explanations:
  • If my above hypothesis about slight cache overuse has some truth to it, and the 7950X and 9950X have RAM with similar latencies and bandwidth to their disposal, then the faster computing 9950X is hurt more by these effects.
    Or/and:
  • If it is correct that Zen 4 fared somewhat better with SMT in some PrimeGrid subprojects, i.e. I did not merely dream of seeing this on my Zen 4 and from others' posts, then this could have been a sign of Zen 4 having a certain frontend bottleneck in AVX workloads (despite its new AVX-512 support which, among else, reduces frontend load), and SMT could work around this bottleneck. Zen 5 has not only a wider backend, but also drastic improvements to the frontend, and might thus be able to fully feed its big vector math related part of the backend even without SMT help (like Zen 2 already could, and I suppose Zen 3 and Zen 1 too).
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,338
4,012
75
Thanks for this! I've been thinking about a 9700x. Could you shut off one of your compute cores on the 9950x and do some benchmarks?

I suppose the 7900x is in a 6/6 configuration, not 8/4 or anything?
 

Orange Kid

Elite Member
Oct 9, 1999
4,375
2,164
146
So a question about the x670e motherboards.
Do they need a bios update for the 9950x or will they work out of the box?
As I have none of the 7000's to use to update.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
I suppose the 7900x is in a 6/6 configuration, not 8/4 or anything?
(Or 9900X even?) I am not aware of documentation on this. But so far, there was a requirement in Zen based CPUs that all core complexes have the same core count. Now in the Zen 5 generation, AMD departs from this the first time with the Ryzen AI 9 HX 370 and Ryzen AI 9 365 laptop CPUs, also known as Strix Point, which are dual-CCX CPUs with 4+8 and 4+6 cores respectively. Desktop Ryzen 9000 however inherit the IOD from Ryzen 7000, which makes it likely that these are still restricted to equal core count across CCXs.

Do they need a bios update for the 9950x or will they work out of the box?
As I have none of the 7000's to use to update.
You will most likely have to update the BIOS, but you are supposed to be able to do this with the new CPU already installed. This is a functionality of all AM5 mainboards:
Ryan Smith & Gavin Bonshor said:
Native BIOS Flashback Support
[...] For the AM5 platform, AMD is taking matters into their own hands to make USB BIOS Flashback a universal feature. Ryzen 7000 chips will be able to support USB BIOS Flashback mode across the board and regardless of the BIOS currently installed. As a result, users will always be able to flash an updated BIOS to their AM5 boards, regardless of the CPUs supported by or the operational status of the current motherboard.
The net impact is that when AMD releases future chips on the AM5 platform – say, a Zen 5 chip in 2024 – it will be possible to install a compatible BIOS without first having to use a Zen 4 chip. [...]
(from AnandTech's Ryzen 7000 review)
 
  • Like
Reactions: Ken g6 and Icecold

Icecold

Golden Member
Nov 15, 2004
1,099
1,027
146
So a question about the x670e motherboards.
Do they need a bios update for the 9950x or will they work out of the box?
As I have none of the 7000's to use to update.

My x670e motherboard that I purchased when I purchased the 9950x had a sticker on the box that it was 9000 series ready, and came with a new enough bios to work with the 9950x. It wasn't the latest bios though so I updated it regardless.

Thanks for this! I've been thinking about a 9700x. Could you shut off one of your compute cores on the 9950x and do some benchmarks?

I suppose the 7900x is in a 6/6 configuration, not 8/4 or anything?

I definitely can. Is there a specific project or benchmark you'd like to see? A 9700x should be basically the same as the 9950x benchmarks I've posted, just half the output since half the cores. The 7900x is 6/6 configuration, I'm not sure on a 9900x.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,062
15,199
136
My current test is a 7950x3d vs a 9950x, 2 x 8C SMP off (per lasso) on SOB using windows 10. Its only 30 some percent done, so I will reply in 10 hours when the 7950x3d finishes. But right now the ETAs are 13 and 16 hours, but the 9950x keeps accelerating it lead. I will let you know then.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,062
15,199
136
It looks like 12.5 hours vs 14.5 hours. That's only 14% better, unless my math is backward. I will be in bed before they both finish.

Could the 7850x3d be a little faster due to one ccd having all that cache ? The other task is 15.5 hours. That is 20.5% faster than the other,.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,062
15,199
136
I made it until the 9950x finished (well almost) 12:20 and 12:25 times. 7950x3d are 2 hours and 3 hours behind.
1724650238354.png
 

cellarnoise

Senior member
Mar 22, 2017
744
403
136
I wonder if the Windows 11 soon to be released? Update will increase the distance between Zen4 and Zen5, like it appears in gaming?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,062
15,199
136
I wonder if the Windows 11 soon to be released? Update will increase the distance between Zen4 and Zen5, like it appears in gaming?
Very possible. I have had a nightmare with linux lately, but also with F@H for windows. And now win 10 (all my boxes) have this performance boost on Zen 4, that it will NOT get. No good options right now. And my new Turin is NOT supposed to run on any supermicro motherboard ???? Still waiting on confirmation on that. Might try it anyway, and be down a Genoa for a while.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
I wonder if the Windows 11 soon to be released? Update will increase the distance between Zen4 and Zen5, like it appears in gaming?
This update will be beneficial to scientific applications too, but to varying degrees. From what I understand about it, programs with frequently occurring branches are especially affected. I am expecting it to make very little difference to vector-math heavy applications like PrimeGrid's.

Edit: Andreas Schilling of Hardwareluxx wrote that Cinebench, Blender, V-Ray, Corona, Handbrake, 7-Zip and Geekbench did not show changes beyond the error of measurement; only several of their game tests profited from the performance bug fix. — Also, according to a post in the CPU subforum, the fix basically only affects syscalls = switches between usermode and kernelmode. If this is true, then my above assumption is invalid, and scientific computation will practically see no gains from this particular bugfix.
 
Last edited:
  • Like
Reactions: Ken g6