MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

StefanR5R · Jan 21, 2021

Endgame124 said:
Do you have any AMD material that calls out "factory OC'd" boost values being unstable and unsuitable for serious compute? Also, isn't distributed computing based on the idea of using idle consumer hardware and getting some benefit out of it? It would be a pretty serious change in direction to say that only server class hardware was desirable for these projects.

Errm, earlier I merely expressed my confusion of why a CPU at this clock configuration was undervolted and then used for sustained computation. [#239]

I don't know what server CPUs have to do with this. I did not bring server CPUs into this discussion.

So… is there, by any chance, some AMD material on the application of their new Curve Optimizer in the area of scientific computing and engineering?

Edit: added back references

Endgame124 · Jan 21, 2021

Here is the A10-7870k host with integrated Gpu and R250X Gpu work units. The R250X is just a little bit faster than the integrated Gpu (around 80 seconds faster, give or take). Interestingly, I’m seeing validation errors off the GPUs as well.

Valid tasks for computer 848090

milkyway.cs.rpi.edu

Assimilator1 · Jan 22, 2021

So what's the average time for at least 5 WUs?

Endgame124 · Jan 23, 2021

Assimilator1 said:
So what's the average time for at least 5 WUs?

A10-7870k iGPU (DDR 3 2400, GPU boost set to Extreme):

	GPU Time	CPU Time
	472.15	22.81
	472.13	23.34
	472.12	22.22
	472.34	22.44
	472.13	23.48
	472.07	23.06
	472.21	22.44
	472.1	23.36
Average	472.1571	22.90571

R250x:

	404.99	28.05
	404.91	27.56
	404.99	28.05
	405.05	26.78
	405.01	27.53
	406	29.13
	404.08	26.05
Average	405.0043	27.59286

Assimilator1 · Jan 24, 2021

Ta

, 472s and 405s then, all 227.5x WUs right?

Endgame124 · Jan 24, 2021

Assimilator1 said:
Ta , 472s and 405s then, all 227.5x WUs right?

Yes, all 227.5x WUs. After the F@H race, I’ll grab numbers for the cpu too.

Assimilator1 · Apr 22, 2021

Times added to the op for Endgames124's R7 250X, iGPU and RX 6700 XT (from his thread).
Thanks for the times mate

(sorry for the delayed post of the 1st 2!).

Assimilator1 · Apr 23, 2021

Quoting Endgame124 here from another thread to keep the info in 1 place.

Endgame124 said:
5950x in eco mode running 30 threads of Rosetta and one idle CPU to handle video card.

the video card itself is at power target -6, and -25millivolt, but it still stays at 2500mhz pretty much all the time.

All of the MW work Units were run with this configuration.

Endgame124 · Apr 23, 2021

Assimilator1 said:
Quoting Endgame124 here from another thread to keep the info in 1 place.

To translate Eco mode to actual clockspeed, the 5950 is pretty much locked at 2975 mhz while running MW + 30 threads of Rosetta.

Assimilator1 · Apr 23, 2021

Nice CPU

, and thanks for the info.

Assimilator1 · Dec 18, 2022

Just a little update from my rigs so I can compare my out going RX 580 to me new RTX 3060 Ti (which apparently is 70% down on FP64 FLOPs vs the RX 580).

So it seems atm that the most common credit WU is the 227.1x variants.
Looking at 1 page of results for my RX 580 (16x 227.11, v1.46 separation) they range from 170-175s running 2 WUs in parallel (drivers 22.5.1), GPU clocked at 1266 MHz (underclocked to 480 levels to cut power and heat).
For my HD 7870 XT they range from 67-72s

running single WUs (20x mostly 227.14/15 WUs), it has nearly double the FP64 of the RX 580. Concurrent WUs are mostly running at 140-142s.

Looking at the table in the op, it seems the times are about the same, except for concurrent WUs on the 7870 XT which are now double the single WU time, up ~50% on the old concurrent times, odd. Although originally it was doing them in about 140s! Wth?? I'm going to change the time back to 140s, seems more consistent.

Assimilator1 · Dec 19, 2022

For my 3060 Ti it seems most 227.1x WUs are being done in about 255-260s (for 2 concurrently), so about 49% slower than the RX 580 (less than the DP/FP64 loss!

).
Seems the Nvidia card has much higher CPU run times than the AMD card, what's that about?

An interesting point, the total system draw with the RX 580 was ~233w running MW@H (& Universe or WCG), with the 3060 Ti it's ~186w

Not sure what that works out to with w/credit atm....

mmonnin03 · Dec 19, 2022

Thats how OpenCL works on NV cards. They always use a CPU thread.

Assimilator1 · Dec 20, 2022

Ah I see, so I already left a free physical core (2 threads) for MW anyway, so that'll be fine for MW with Nvidia then?

Markfw · Dec 20, 2022

Not sure all the questions/answers. But this is quite a few finished units on a 7452 EPYC with 32 threads doing WCG and 32 threads free for 3 Titan V's.

Markfw · Dec 20, 2022

I have one more Titan V running on a 5950x with only 10% or less CPU used, all the tasks take about 1:01.

crashtech · Dec 20, 2022

@Mark, I assume the results above are from doing a certain number of tasks simultaneously?

Markfw · Dec 20, 2022

crashtech said:
@Mark, I assume the results above are from doing a certain number of tasks simultaneously?

Yes 7 at a time per card on 4 total Titan V's, so 28 at a time total.

Assimilator1 · Dec 20, 2022

Wow! So great output from those then!

Re the 5950+Titan V system, as in ~10% CPU left free for MW?

Markfw · Dec 20, 2022

Assimilator1 said:
Wow! So great output from those then!

Re the 5950+Titan V system, as in ~10% CPU left free for MW?

No, the 5950x system has only 7-12% CPU used. MW is all I am running on that box, and monitoring software of my other boxes. I can't get WCG or Rosetta units, so only 5% of my boxes are loaded now.

Assimilator1 · Dec 20, 2022

Got ya.
Btw, I don't suppose you recall the power draw difference between similarly clocked (and core numbers) 3xxx Ryzen vs 5xxx?
Tempted to get a 5600X or 5800X in the new year.

Assimilator1 · Jul 8, 2023

MW's Seperation project ended this month, so only the CPU project is running now.

Anyone fancy posting benchmarks for that? (I'll post some too at some point).
@Endgame124 did you ever post benchmarks for the 5950 you had a couple of years ago? As per this post. (I can't see any).

StefanR5R · Jul 8, 2023

I always had in mind to come up with a script for repeatable benchmark runs for the Separation project, but then never went through with it. Partially because of lack of spare time, partially because I never bought a GPU which would have performed decently at MilkyWay.

Now that there is only the multithreaded-CPU N-Body application left, it would be interesting to come up with a benchmarking script for that one, e.g. in order to find a CPU's sweet spot for the thread count per task. But I haven't even looked yet whether the input data for this application is small enough to build a benchmark tool around it.

Assimilator1 · Jul 8, 2023

Well, let us know if you do!

Assimilator1 · Jul 9, 2023

I finally got BOINC to accept MW CPU WUs earlier (somehow the app ws unticked in MW preferences), not got any times yet.
I've noticed though that MW is only using ~80% of the CPU (BOINC preferences are set to 100%), any ideas why? (nothing else is taking the spare CPU power).
I've also noticed that WUs aren't deployed to each thread, but 1 WU for the whole CPU.
MW CPU not properly optimised for SMT?

MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

Elite Member

Senior member

Elite Member

Senior member

Elite Member

Senior member

Elite Member

Elite Member

Senior member

Elite Member

Elite Member

Elite Member

Senior member

Elite Member

Moderator Emeritus, Elite Member

Moderator Emeritus, Elite Member

Lifer

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member