AMD Ryzen 5000 Builders Thread

Page 45 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
Last edited:

cortexa99

Senior member
Jul 2, 2018
319
505
136
New Asrock X370 BIOS support for Zen3, with some bug fix:

download:

WARNING: NOT OFFICIAL BIOS, use at your own risk

X370 Gaming K4
X370 Gaming X
X370 Killer SLI
X370 Killer SLI ac
X370 TAICHI
X370 Professional Gaming
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Some thoughts about 5950x system i have put together:

this system is direct "descendant" of the following 3950x described in this post:


I have ended up running 4.6Ghz static OC on 5950x. Memory runs 3600C15:
1619717853717.png

I have spent TON of time trying to stabilize it over 3600, but all in vain, it seems 4 dual rank DIMMs and 64GB of RAM is too much for MB+CPU combo. ~same, that same type of memory ran on 3950x, so no big advances in clock speeds.

But memory performance is WOW, 58ns latency in AIDA64 is just sweet.

Last year I said @same clock of 4ghz 9980x was slower by 40% versus 3950x? This year i did not do 4Ghz Apples-Oranges test anymore cause this CPU can clock some 15% better and:

5950x destroys my reference tests and my jaw dropped when it went below 1s completion time. In fact it completes in 930ms, when for example my desktop 5.1Ghz 10900K does same load in ~1560ms. I don't have numbers from 3950x ( cause it is busy in production ), but AMD made HUGE strides here.
No need to speculate why - 8C cluster, great core, 32MB of L3 and memory subsystem that no longer is holding that great core down. End result is utter domination in one of our workloads ( and future EPYC sales, to augment and replace some of Xeons we are running ).

The BAD: Infrastructure, not only there is AMD specific version of our vendor stuff for like 4 month now only, it is very inmature. For example startup and warmup takes TWICE the time it takes on ref Intel system. I am sure things will get ironed out, but right now it eats into performance margin and viability overall. And obviously due to market situation vendors are not exactly rushing to optimize for AMD, but it seems performance is there and thing are changing for good.

Compared to the last year, situation also improved big time here. Even on 3950x things were ironed out and 5950x is just cruising over workload.

Job well done AMD!
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I have ended up running 4.6Ghz static OC on 5950x

NIce!

Why did you go the static route on the clocks?

I opted for the best of both worlds and put in the time to tweak the uEFI on my 5900x. I'm getting 4.6GHz all core with single core boosting up around 5GHz or so. It did take some effort playing around with the core optimizers - offsets on a few of the cores, but in the end it was a fun adventure.
 
  • Like
Reactions: Drazick

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Why did you go the static route on the clocks?

If i were to use it for my personal desktop, I'd go different route ( and different memory too, to hunt those 1:1 clocks ). But for this special purpose, performance consistency and power efficiency is of utmost importance. These systems sit in my work room, cooled by Dark Rock 4Pro, and noise. power usage levels are very important.
I think 3950x runs 4.05, and now 5950x is on 4.6Ghz, 9980XE is 4Ghz flat.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
That would be 64MB of L3 Cache since that is a two CCD chip. 32MB per CCD.

Good point, even if "only" 32MB is accessible to one thread (and makes impact on that 930ms vs 1560ms comparison), our workload scales to many cores and threads that get scheduled on different CCDs do not compete for cache and each cache hit everywhere results in less memory traffic for whole chip.
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
And the results show in high fps esport
As i can tell this is even without dual rank memory?
The problem with zen3 is it kinds of need that dual rank to get the most of it.
Playing ow at 280 fps with my 5600x and it can keep the mins up there.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
5950X @ 4.6 GHz on a Dark Rock Pro? Wow. Nice.

DR4 PRO is huge, i think they market it as 250W class air cooler or sth.
But on topic of 4.6Ghz, it is without HT and is not Prime95 stable ( or whatever FMA at full throttle involving super torture is in fashion nowadays ).

But the jump from 4.05 static on 3950x to 4.6Ghz 5950x was very welcome surprise to me, i kept increasing freq and cpu kept being stable. ( CB20 + AIDA stab + OCCT without AVX stable ).

EDIT: i have read horror stories on the web about chips throwing WHEA errors on stock, one CCD being 200mhz worse than other and so on. This chip is rock solid so far, maybe AMD ran out of bad dies for new CPUs as these chips are literally the first ones in my country ever.
 
  • Like
Reactions: lightmanek

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
But on topic of 4.6Ghz, it is without HT and is not Prime95 stable

Oh okay. What can you get Cinebench R20/R23 stable with HT enabled? Prime95 is in a class all by itself. As much as I hate to admit it, for anything AVX2-enabled, I turn off my OC usually and just let the default boost algo handle things. With some creative undervolting. That's on Zen2, but I would think it would apply to Zen3 as well.
 

Det0x

Golden Member
Sep 11, 2014
1,028
2,953
136
New CTR release and new clocks, these are hopefully my new 24/7 settings, atleast 5x IBT very high AVX workload stable.
Playing around with the overboost now, behaving just like PBO CO.. Too high value and the cpu is not stable in idle/light workloads

Threadscaling in Cinebench r20

CTR 2.1 RC5 v18.
Latest bios 3501 for these runs, which i did all back-to-back (could gain a few points with restarts between runs)
  • 1 thread = 662 points
  • 2 threads = 1309 points
  • 4 threads = 2544 points
  • 6 threads = 3796 points
  • 8 threads = 4991 points
  • 10 threads = 6083 points
  • 12 threads = 7212 points
  • 14 threads = 8139 points
  • 16 threads = 8708 points
  • 20 threads = 9400 points
  • 24 threads = 10183 points
  • 28 threads = 11123 points
  • 32 threads = 12158 points
LLC4 = upto 2% vdroop

PX high = from 1 to 2 threads @ 5025mhz
PX mid = from 3 to 4 threads @ 4900mhz
PX low = from 5 to 9 threads @ 4825mhz
P2 = from 10 to 24 threads @ 4800/4675mhz
P1 = from 25 to 32 threads @ 4750/4625mhz

PBO CO benchmode: (ambient ~ 20 degrees)
Bios 3003, which have the best PBO CO boosting behavior of all asus bioses
  • 1 thread = 662 points
  • 2 threads = 1303 points
  • 4 threads = 2444 points
  • 6 threads = 3706 points
  • 8 threads = 4887 points
  • 10 threads = 5974 points
  • 12 threads = 7022 points
  • 14 threads = 7906 points
  • 16 threads = 8645 points
  • 20 threads = 9583 points
  • ...seems like i didn't save 24 thread screenshot, but 105xx score
  • 32 threads = 12238 points
CTR 2.04 hotfix: (ambient ~ 24 degrees)
Bios 3003, but dont matter since using CTR
  • 1 thread = 652 points
  • 2 threads = 1295 points
  • 4 threads = 2525 points
  • 6 threads = 3752 points
  • 8 threads = 4979 points
  • 10 threads = 6016 points
  • 12 threads = 7171 points
  • 14 threads = 8287 points
  • 16 threads = 8831 points
  • 20 threads = 9539 points
  • 24 threads = 10217 points
  • 28 threads = 11117 points
  • 32 threads = 12032 points
Results from CTR 2.1 RC5
Latest bios 3501 for these runs, which i did all back-to-back (could gain a few points with restarts between runs)
  • 1 thread = 668 points
  • 2 threads = 1302 points
  • 4 threads = 2528 points
  • 6 threads = 3800 points
  • 8 threads = 4999 points
  • 10 threads = 6081 points
  • 12 threads = 7187 points
  • 14 threads = 8185 points
  • 16 threads = 8963 points
  • 20 threads = 9540 points
  • 24 threads = 10031 points
  • 28 threads = 11044 points
  • 32 threads = 12064 points
1-settings.png
Screenshots @
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
16 threads = 8708 points

The reason why I don't run HT is right here, Windows scheduler is dumb as nails and won't properly schedule a load as simple as Cinebench that scales almost 100% and uses little memory resources. Task to schedule 16 thread of 100% each on physical core of its own for whole time of benchmark is too much for Windows it seems:

1619861440625.png


Gettting ~9150 with HT disabled, so enabling HT would cost nearly 500pts of peak throughput.
Also on paper it seems that by disabling HT equals giving up 30% of capacity? Maybe, but from my testing 9980xe and 3950x, the following happens: when the CPU is ~66% of load ( so maybe 22-24 threads total ), task completion times consistency goes to drain, one might complete in 1.5s, the other in 2+s.
While it is obvious that some tasks need to share core with different HT thread , but the problem of scheduling and rescheduling optimally is too hard.
3950x used to fall real hard, due to complications of 4C sized CCX'es under heavy load, while 9980xe was faring better due to monolith and probably having a single L3 cache to help with threads moving between cores. If i have time i will retest 5950x in this heavy load enviroment with HT enabled.

P.S. and 9200 pts in CB20 is so near of score that 3950x with HT enabled was getting without going too crazy with OC, awesome advance in IPC.
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
The reason why I don't run HT is right here, Windows scheduler is dumb as nails and won't properly schedule a load as simple as Cinebench that scales almost 100% and uses little memory resources. Task to schedule 16 thread of 100% each on physical core of its own for whole time of benchmark is too much for Windows it seems:

View attachment 43802


Gettting ~9150 with HT disabled, so enabling HT would cost nearly 500pts of peak throughput.
Also on paper it seems that by disabling HT equals giving up 30% of capacity? Maybe, but from my testing 9980xe and 3950x, the following happens: when the CPU is ~66% of load ( so maybe 22-24 threads total ), task completion times consistency goes to drain, one might complete in 1.5s, the other in 2+s.
While it is obvious that some tasks need to share core with different HT thread , but the problem of scheduling and rescheduling optimally is too hard.
3950x used to fall real hard, due to complications of 4C sized CCX'es under heavy load, while 9980xe was faring better due to monolith and probably having a single L3 cache to help with threads moving between cores. If i have time i will retest 5950x in this heavy load enviroment with HT enabled.

P.S. and 9200 pts in CB20 is so near of score that 3950x with HT enabled was getting without going too crazy with OC, awesome advance in IPC.

Well, it's not so much a negative as, just the way the chip was designed, performing better with HT on. Also, you would need to pin / restrict CB to your 16 physical cores in Windows task manager if that is what you want to test; leave HT on, restrict CB to use only the physical cores.

CB tests always get shuffled around by Windows due to high frequency scheduling and what not, I would not get too hung on it as a testing metric.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Well, it's not so much a negative as, just the way the chip was designed, performing better with HT on. Also, you would need to pin / restrict CB to your 16 physical cores in Windows task manager if that is what you want to test; leave HT on, restrict CB to use only the physical cores.

CB tests always get shuffled around by Windows due to high frequency scheduling and what not, I would not get too hung on it as a testing metric.

Yeah, i am well aware of that. Heck, in fact i was doing it with earlier CB15 and Xeon testing in the thread on this forum several years ago, where selecting first physical cores on each socket upto the limit of CB15 ( it was 32 threads i think ) resulted in best performance.

But the problem is the same with any task really, CB is just a perfect proxy, cause it does not care about memory subsystem and displays 100% of reverse scaling with HT.

The task ahead of scheduler is hard to solve: some scenario, lets imagine fully loaded 32 thread system, where both HT threads are fully loaded. Now some time after half of tasks that are all on single CCD complete and there are now 8 physical cores and 8 more HT threads ready. Should tasks from the first CCD get rescheduled to different CCD and suffer cold L2, L3 and chip OOP execution machinery will need time to spin up, maybe clocks need changing?
Is above task different if tasks will run for minutes? What if they complete in 1ms and reschedule makes them complete in 5ms instead loosing performance here and now?

Mission impossible, Linux is currently better, but not perfect and MS is lagging and not really ready for such "scenarios". Loosing 5-7% of potential performance in a task as simpleton as CB is bad result.
 
  • Like
Reactions: lightmanek

Timur Born

Senior member
Feb 14, 2016
277
139
116
Windows 10 (1909) thread scheduler is at least smart enough to know which cores of my 5900X are the fastest/strongest.

On CCD 1 these two cores (C03 + C04) are the last to be send to sleep with core parking disabled and kept awake with core parking enabled.

On CCD 2 Windows behaves a bit strange, though. With core parking enabled the strong cores (C08 + C10) are kept awake (depending on CP percentage), but with core parking disabled core C11 is the only core of both CCDs that never sleeps. I will have to check if some software forces its thread(s) to that core.

Overall the comparison of 5900X system vs. 9900K system is a bit of a let-down after only a few days of testing. My highest multi-thread load (Topaz Gigapixel AI) only improves by 28%. And the one single-threaded task that keeps me waiting the longest is not improved at all (over 30 seconds load times of Lua addons after Wow login or loading of Lua modules in Fantasy Grounds VTT).

System power draw also increased significantly, which is especially dramatic when running Wow at 60 fps (25 - 30% more power draw for displaying the same content).
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
Windows 10 (1909) thread scheduler is at least smart enough to know which cores of my 5900X are the fastest/strongest.

On CCD 1 these two cores (C03 + C04) are the last to be send to sleep with core parking disabled and kept awake with core parking enabled.

On CCD 2 Windows behaves a bit strange, though. With core parking enabled the strong cores (C08 + C10) are kept awake (depending on CP percentage), but with core parking disabled core C11 is the only core of both CCDs that never sleeps. I will have to check if some software forces its thread(s) to that core.

Overall the comparison of 5900X system vs. 9900K system is a bit of a let-down after only a few days of testing. My highest multi-thread load (Topaz Gigapixel AI) only improves by 28%. And the one single-threaded task that keeps me waiting the longest is not improved at all (over 30 seconds load times of Lua addons after Wow login or loading of Lua modules in Fantasy Grounds VTT).

System power draw also increased significantly, which is especially dramatic when running Wow at 60 fps (25 - 30% more power draw for displaying the same content).

Load times are dependent on your data drive I would think, what are you using for that?

Also, you went from 8c16t to 12c/24t, I would think it would use a little more power, did you do through all the written reviews? Anandtech does a very good job highlighting the pluses and minuses of new cpu's and platforms.

Although your Zen3 upgrade is using more electrical power, it is used for more performance, whether you see it in particular workloads is a different matter.

If you set your PPT in the BIOS to 120 - 130, it will cut power usage.
 

Timur Born

Senior member
Feb 14, 2016
277
139
116
Load times are dependent on your data drive I would think, what are you using for that?
ADATA SX 8200 Pro, but we are talking pure single-thread Lua bottleneck here, even with data coming straight out of the disk cache (aka memory).

Also, you went from 8c16t to 12c/24t, I would think it would use a little more power,
Indeed, but when cores are sent to deep sleep (C6) I expect them to not add much wattage. I suspect that the IO + chipset (basically same chip) eat relatively much (thus the active cooling need for the chipset).

did you do through all the written reviews? Anandtech does a very good job highlighting the pluses and minuses of new cpu's and platforms.
Yes, but I need a Ryzen 5000 for driver testing anyway (like the USB problems). I only found out about the lack of extra Lua performance now, though. Basically it seems that all CPUs are only separated by clock-rate when it comes to Lua.

Although your Zen3 upgrade is using more electrical power, it is used for more performance, whether you see it in particular workloads is a different matter.
The whole deep sleep stuff is made for turning things off that are not needed. And using 25-30% more power (40+ watts) for doing the same thing at a time is still quite a lot.

Try enabling ECO mode in BIOS and then compare again.
I did, dropped power by about 10 watts under my Wow test-load, which still corresponds to 20-25% more than the 9900K doing the same thing.
 

B-Riz

Golden Member
Feb 15, 2011
1,482
612
136
ADATA SX 8200 Pro, but we are talking pure single-thread Lua bottleneck here, even with data coming straight out of the disk cache (aka memory).


Indeed, but when cores are sent to deep sleep (C6) I expect them to not add much wattage. I suspect that the IO + chipset (basically same chip) eat relatively much (thus the active cooling need for the chipset).


Yes, but I need a Ryzen 5000 for driver testing anyway (like the USB problems). I only found out about the lack of extra Lua performance now, though. Basically it seems that all CPUs are only separated by clock-rate when it comes to Lua.


The whole deep sleep stuff is made for turning things off that are not needed. And using 25-30% more power (40+ watts) for doing the same thing at a time is still quite a lot.


I did, dropped power by about 10 watts under my Wow test-load, which still corresponds to 20-25% more than the 9900K doing the same thing.

Non core power went up 16W to 21W, I dunno, is it really that big a deal? Maybe WoW is just an inefficient power hog? :laughing:

Maybe a 5800X vs a 9900K in WoW would be a better test.


Peak 9900K still uses more power than peak 5900X.

AT Bench 9900k vs 5900X https://www.anandtech.com/bench/product/2784?vs=2674
 
  • Like
Reactions: Tlh97 and Makaveli

Timur Born

Senior member
Feb 14, 2016
277
139
116
Non core power went up 16W to 21W, I dunno, is it really that big a deal? Maybe WoW is just an inefficient power hog? :laughing:
40 watts more power for displaying the very same scene at 60 fps vertical sync. Same GPU, same software installation (cloned drive), same settings. This indeed is quite a lot for doing the same thing.

Don't forget that frame-rate is fixed at 60 fps, so there are no benefits in the game in return for the additional power-draw. There should be less dips to lower fps, but often these are caused by Lua addons when CPU is the limiting factor (rather than GPU).

I will check power-consumption for a Topaz Gigapixel AI run to see how much more power is used in return for finishing 28% earlier (vs. the 9900K).
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
40 watts more power for displaying the very same scene at 60 fps vertical sync. Same GPU, same software installation (cloned drive), same settings. This indeed is quite a lot for doing the same thing.

Don't forget that frame-rate is fixed at 60 fps, so there are no benefits in the game in return for the additional power-draw. There should be less dips to lower fps, but often these are caused by Lua addons when CPU is the limiting factor (rather than GPU).

I will check power-consumption for a Topaz Gigapixel AI run to see how much more power is used in return for finishing 28% earlier (vs. the 9900K).

My experience with 3950 and 5950 is pretty much the same - if left on "auto" these CPUs are very inneficient in low load regime, and their efficiency "at stock" kicks in once many cores are loaded and per core wattages drops. And it is easy to make things worse with enabling PBO or overclocking memory.

Even with manual tuning, things don't get really better. For example my desktop machine is 10900K, 5.1Ghz, no downclocking, DDR4 3900C15, 100% high performance plan with C states enabled @1.32V :
at idle it uses 3-5W of package power

When tuning 5950x i've found it runs 4.4ghz static 1.1375V core, DDR4 3600C15, ~1.1V SOC, 100% high perf plan with C states enabled:
at idle it uses 30W of package power, while showing CPU spends >98% in C6 power state. WTH really.

Where AMD coveted efficiency kicks in, is when CPU is actually loaded. 10900K at my settings is using 200W when running CB20, 5950x 120W. And 5k vs 8.7K score ( both no HT ).

( Yes i know that on BIOS defaults idle package power power is 20W cause CPU downclocks, 30W at fixed 4.4Ghz on 1.1375V is just bad).
 
  • Like
Reactions: lightmanek