MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,515
7,820
136
Do you have any AMD material that calls out "factory OC'd" boost values being unstable and unsuitable for serious compute? Also, isn't distributed computing based on the idea of using idle consumer hardware and getting some benefit out of it? It would be a pretty serious change in direction to say that only server class hardware was desirable for these projects.
Errm, earlier I merely expressed my confusion of why a CPU at this clock configuration was undervolted and then used for sustained computation. [#239]

I don't know what server CPUs have to do with this. I did not bring server CPUs into this discussion.

So… is there, by any chance, some AMD material on the application of their new Curve Optimizer in the area of scientific computing and engineering?


Edit: added back references
 
Last edited:

Endgame124

Senior member
Feb 11, 2008
955
669
136
Here is the A10-7870k host with integrated Gpu and R250X Gpu work units. The R250X is just a little bit faster than the integrated Gpu (around 80 seconds faster, give or take). Interestingly, I’m seeing validation errors off the GPUs as well.

 

Endgame124

Senior member
Feb 11, 2008
955
669
136
So what's the average time for at least 5 WUs?
A10-7870k iGPU (DDR 3 2400, GPU boost set to Extreme):
GPU TimeCPU Time
472.1522.81
472.1323.34
472.1222.22
472.3422.44
472.1323.48
472.0723.06
472.2122.44
472.123.36
Average
472.1571​
22.90571​

R250x:
404.9928.05
404.9127.56
404.9928.05
405.0526.78
405.0127.53
40629.13
404.0826.05
Average
405.0043​
27.59286​
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Times added to the op for Endgames124's R7 250X, iGPU and RX 6700 XT (from his thread).
Thanks for the times mate :) (sorry for the delayed post of the 1st 2!).
 
  • Like
Reactions: Endgame124

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Quoting Endgame124 here from another thread to keep the info in 1 place.

5950x in eco mode running 30 threads of Rosetta and one idle CPU to handle video card.

the video card itself is at power target -6, and -25millivolt, but it still stays at 2500mhz pretty much all the time.

All of the MW work Units were run with this configuration.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Just a little update from my rigs so I can compare my out going RX 580 to me new RTX 3060 Ti (which apparently is 70% down on FP64 FLOPs vs the RX 580).

So it seems atm that the most common credit WU is the 227.1x variants.
Looking at 1 page of results for my RX 580 (16x 227.11, v1.46 separation) they range from 170-175s running 2 WUs in parallel (drivers 22.5.1), GPU clocked at 1266 MHz (underclocked to 480 levels to cut power and heat).
For my HD 7870 XT they range from 67-72s :) running single WUs (20x mostly 227.14/15 WUs), it has nearly double the FP64 of the RX 580. Concurrent WUs are mostly running at 140-142s.

Looking at the table in the op, it seems the times are about the same, except for concurrent WUs on the 7870 XT which are now double the single WU time, up ~50% on the old concurrent times, odd. Although originally it was doing them in about 140s! Wth?? I'm going to change the time back to 140s, seems more consistent.
 
Last edited:
  • Like
Reactions: cellarnoise

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
For my 3060 Ti it seems most 227.1x WUs are being done in about 255-260s (for 2 concurrently), so about 49% slower than the RX 580 (less than the DP/FP64 loss! :)).
Seems the Nvidia card has much higher CPU run times than the AMD card, what's that about?

An interesting point, the total system draw with the RX 580 was ~233w running MW@H (& Universe or WCG), with the 3060 Ti it's ~186w :cool:
Not sure what that works out to with w/credit atm....
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Ah I see, so I already left a free physical core (2 threads) for MW anyway, so that'll be fine for MW with Nvidia then?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,516
136
Not sure all the questions/answers. But this is quite a few finished units on a 7452 EPYC with 32 threads doing WCG and 32 threads free for 3 Titan V's.
1671557370897.png
 
  • Like
Reactions: Assimilator1

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,516
136
I have one more Titan V running on a 5950x with only 10% or less CPU used, all the tasks take about 1:01.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Wow! So great output from those then! :D

Re the 5950+Titan V system, as in ~10% CPU left free for MW?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,516
136
Wow! So great output from those then! :D

Re the 5950+Titan V system, as in ~10% CPU left free for MW?
No, the 5950x system has only 7-12% CPU used. MW is all I am running on that box, and monitoring software of my other boxes. I can't get WCG or Rosetta units, so only 5% of my boxes are loaded now.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Got ya.
Btw, I don't suppose you recall the power draw difference between similarly clocked (and core numbers) 3xxx Ryzen vs 5xxx?
Tempted to get a 5600X or 5800X in the new year.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
MW's Seperation project ended this month, so only the CPU project is running now.

Anyone fancy posting benchmarks for that? (I'll post some too at some point).
@Endgame124 did you ever post benchmarks for the 5950 you had a couple of years ago? As per this post. (I can't see any).
 

StefanR5R

Elite Member
Dec 10, 2016
5,515
7,820
136
I always had in mind to come up with a script for repeatable benchmark runs for the Separation project, but then never went through with it. Partially because of lack of spare time, partially because I never bought a GPU which would have performed decently at MilkyWay.

Now that there is only the multithreaded-CPU N-Body application left, it would be interesting to come up with a benchmarking script for that one, e.g. in order to find a CPU's sweet spot for the thread count per task. But I haven't even looked yet whether the input data for this application is small enough to build a benchmark tool around it.
 
  • Like
Reactions: Assimilator1

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
I finally got BOINC to accept MW CPU WUs earlier (somehow the app ws unticked in MW preferences), not got any times yet.
I've noticed though that MW is only using ~80% of the CPU (BOINC preferences are set to 100%), any ideas why? (nothing else is taking the spare CPU power).
I've also noticed that WUs aren't deployed to each thread, but 1 WU for the whole CPU.
MW CPU not properly optimised for SMT?
 
Last edited: