MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2019) - GPU & CPU times wanted for new WUs

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#1
So please share your new scores for old & new GPUs & CPUs alike!

[update 11/12/18] At some point in, or near the summer MW changed the WUs again, currently it seems that 227.62 & 203.92 WUs are the common ones, so here's a new table with the 227.62 credit WU. The app is still v1.46.
I've added a table for running concurrent WUs as many people do that, & Nvidia cards in particular benefit from doing that. Note though that times from doing that can be more erratic than running singularly.

Requirements for the benchmark :-

Average of at least 5 WU times (not cherry picked please! ;)).

A dedicated physical CPU core for each GPU (for optimal MW WU times). If only using BOINC for CPU tasks, & you have an HT capable CPU, then the only way to be certain of this (bar disabling HT) is to set the BOINC computing preferences (in advanced mode>options) so that you have 1 less CPU thread running then you do physical cores. Don't panic too much about lost CPU ppd, it doesn't take long to run MW GPU WUs ;) (see table).

Please state what speed & type CPU you have, as it now has a significant affect on GPU WU times!

Please state GPU clock speeds if overclocked (including factory overclocks) or state 'stock'.

Please only crunch 1 WU at a time per GPU, preferably. Or if you are running concurrent WUs, state how many & I'll put your time in the 2nd GPU table.

For CPU times please state whether Hyper Threading (or equivalent) is enabled or not, times for both states welcomed.

It would also be useful if you could state your BOINC & driver version, & OS, incase it does make any difference.

If you find your WU times are fluctuating more than a couple of % for singly run WUs then use GPU-Z or your grx card driver tools to check that your GPU is able to hit near 100% load (although I'm not sure that Nvidia cards can hit that for MW). Note that even when crunching normally, the GPU load will be on/off on this current MW app, so the GPU load graph should look like a series of blocks. Just looking at my RX 580, it was going to zero load roughly every 27s.
Also check using task manager that your CPU does actually have the spare load to give to MW (& btw, GPU crunching won't show up in the TM).


Current GPU statistics ~ Average Run Time to Complete 1 MW v1.46 227.62 credit WU :-

HD 7970, GPU 1200 MHz(!) (CPU, Xeon E5 ES 10 core @2.7 GHz, HT off) ..... 38.2s .... tictoc
R9 290, GPU 1000 MHz, (CPU, Xeon E5 ES 10 core @2.7 GHz, HT off) ........... 70.9s .... tictoc
HD 7870 XT 3GB(DS), GPU 925 MHz (CPU, C2 Q9550 @3.58 GHz) ................ 73.2s .... Assimilator1
RX 580 GB, GPU 1350 MHz (CPU, i7 4930k @4.1 GHz) .................................... 97.3s .... Assimilator1
RTX 2080 Ti, GPU ???? MHz (CPU, i7-8700K @4.7 GHz no AVX) ................... 110.6s .... IEC
R7 iGPU on an AMD A12-9800 APU (CPU, 4.2 GHz) ....................................... 120.3s .... hoppisaur
RX 570, GPU stock (CPU, i7-4771 ?? GHz) ....................................................... 121s ....... Jim1348
Tesla T4, (CPU, ????) ......................................................................................... 151s ........ vseven


Current GPU statistics ~ Average Run Time to Complete multiple MW v1.46 227.62 credit WU :-

RX 570, GPU stock (CPU, i7-4771 ?? GHz) (2 concurrent WUs) ..................... 194s ....... Jim1348

Current CPU statistics ~ Average Run Time to Complete 1 MW v1.74 227.62 credit WU :-



**********************************************************************************************************************************
Since v1.46 was released on 1/5/17 (UK date format :p), the WU times & credits changed. Times are apparently 'slightly longer' & the main WUs (99%+) & thus the new benchmark WU was 227.23 credits. See this post for more info. See below v1.46 table for the other benchmark requirements.
Btw, watch out for the 227.26 credit WUs, they are very rare (approx. 1% of WUs atm), but despite their tiny increase in credit they take about 5% longer, at least on my HD 7970, ~56s vs 53s.

GPU statistics ~ Average Run Time to Complete 1 MW v1.46 227.23 credit WU :-

R9 280X, GPU 1030 MHz (CPU, ???) ................................................................... 50.4s .... JoeM
HD 7970, GPU 1000 MHz (CPU, i7 4930k @4.1 GHz) ........................................ 53s ....... Assimilator1
Vega 56, stock (CPU, 2500k @4.3 GHz) .............................................................. 63s ....... Chooka
HD 6970, GPU 890 MHz (CPU, Phenom II X6 1090T, stock) .............................. 94s ....... Hassan Shebli
HD 6970, stock (CPU, ???????) .......................................................................... 107s ....... JoeM
RX 480 8GB, GPU o/c to? (CPU, Phenom II X6 1100T @?) .............................. 110s ....... Darrell
HD 5870, GPU 900 MHz, (CPU, ?????? ) ............................................................ 116s ....... JoeM
RX 470 4GB, GPU 1205 MHz (CPU, Phenom II 1100T, stock) .......................... 127s ....... [AF>HFR] Seeds
GTX 1070 Ti, GPU 2 GHz (CPU, Ryzen 1700X @3.9 GHz) ................................ 170s ...... Keith Myers
GTX 1060, stock (CPU, Pentium G3900) ........................................................... 250s ...... DVDL
HD 7750, stock (CPU, ? ) ..................................................................................... 647s ..... JoeM

CPU statistics ~ Average Run Time to Complete 1 MW v1.46 227.23 credit WU :-

Ryzen R7 1700X (8C, stock 3.4 GHz, RAM o/c 2667 MHz) .................................. 3315s no HT ... JoeM
Ryzen R7 1700X (8C, stock 3.4 GHz, RAM o/c 2667 MHz) .................................. 4428s HT on ... JoeM
8350 (7C, ?????) ...................................................................................................... 5105s .............. JoeM
8350 (7C, ?????) ...................................................................................................... 5388s .............. JoeM

**********************************************************************************************************

Old app GPU statistics ~ Average Run Time to Complete 1 MW v1.42 133.66 credit WU :-

HD 7970, GPU 1250 MHz (CPU, AMD R7 1700 @3.8 GHz) ................................ 32.1s ... tictoc
R9 280X, GPU 1080 MHz (CPU, Pentium G3220 @3 GHz) ................................. 40.1s ... Tennessee Tony
HD 7970, GPU 1000 MHz (CPU, i7 4930k @4.1 GHz) ......................................... 42s ...... Assimilator1
R9 280X, Stock (CPU, C2D E6550, stock) ............................................................. 54.3s ... iwajabitw
R9 280X, GPU 1020 MHz (CPU, AMD FX8320E @3.47 GHz) .............................. 54.8s ... Tennessee Tony
HD 7950, GPU 860 MHz (CPU, i7 3770k, stock) .................................................. 56.5s ... salvorhardin
HD 7870 XT 3GB(DS), GPU 925 MHz (CPU, C2 Q9550 @3.58 GHz) .................. 56.8s ... Assimilator1
R9 390, GPU 1015 MHz (CPU, i7 3770k, stock) ................................................... 60.7s ... salvorhardin
R9 Fury, GPU 1050 MHz (i7 5820k @4.4 GHz) .................................................... 65.9s ... crashtech
RX 480, GPU 1415 MHz, RAM 2025 MHz (CPU, i5 6600k, 4.6 GHz) .................. 72.1s ... TomTheMetalGod
HD 6950, stock (CPU Athlon2 X4 620 @2.6 GHz) ............................................. 101.2s ... waffleironhead
GTX 1080, GPU 2000 MHz (CPU, i7 6950X @4 GHz) ........................................ 116s ...... StefanR5R
GTX 980, GPU 1303 MHz (CPU, i7 5820k @3.3 GHz) ....................................... 184s ...... iwajabitw
RX 460, GPU 1244 MHz (CPU, i5 4460 @3.2 GHz) ............................................ 240.5s ... waffleironhead
Quadro K2100M, stock (CPU, i7 4900 MQ turbo @3.8 GHz) ............................ 1784s ...... StefanR5R

StefanR5R has posted a load of scores here. So if you're interested in scores for a Xeon E5-2690 v4, Phenom II X4 905e, Core 2 T7600, i7 6950X, i7 4960X, i7 4900MQ, GTX 1070, GTX 1080 (I put the highest clock score in the table above), & a Firepro W7000 then check out his very useful post! :)

Current CPU statistics ~ Average Run Time to Complete 1 MW v1.4x 133.66 credit WU :-

i7 5820k @3.3 GHz ......................................................................... 2723s no 'HT load' .... iwajabitw
i7 4930k @4.1 GHz (6 threads for CPU) ....................................... 2825s no 'HT load' .... Assimilator1
i7 4930k @4.1 GHz (10 threads for CPU, 2 for GPU).................... 4171s HT on .............. Assimilator1
I7 4930k @4.1 GHz (12 threads for CPU) ..................................... 4557s HT on .............. Assimilator1

***********************************************************************************************
Info:-

My previous MW benchmark thread spring 2014 - summer 2016

Stock clocks for some of the commonly used graphics cards for MW (& cards with good double precision power), source Wiki (GPU/RAM MHz or MT/s if stated) :-

AMD .............................GPU/RAM ................................... DP GFLOPS
HD 4890 ...................... 850/975 ....................................... 272*
HD 5830 ...................... 800/1000 ..................................... 358
HD 5850 ...................... 725/1000 ..................................... 418
HD 5870 ...................... 850/1200 ..................................... 544
HD 5970 ...................... 725/1000 (dual GPU) .................. 928
HD 6930 ...................... 750/1200 ..................................... 480
HD 6950 ...................... 800/1250 ..................................... 563
HD 6970 ...................... 880/1375 ..................................... 675
HD 6990 ...................... 830/1250 (dual GPU) ................ 1277
HD 7870 XT ................. 925-975/1500 ............................. 710-749
HD 7950 ...................... 800/1250 ..................................... 717
HD 7950 Boost ........... 850-925/1250 .............................. 762-829
HD 7970 ...................... 925/1375 ..................................... 947
HD 7970 GE ............... 1000-1050/1500 ......................... 1024-1075
HD 7990 ..................... 950-1000/1500 (dual GPU) ........ 1894-2048
R9 280 ........................ 827-933/1250 .............................. 741-836
R9 280X ...................... 850-1000/1500 ............................ 870-1024
R9 290 ........................ >947/5000 MT/s .......................... 606
R9 290X ...................... >1000/5000 MT/s ....................... 704
R9 295X2 .................... 1018/5000 MT/s (dual GPU) .... 1433
R9 390 ........................ >1000/6000 MT/s ....................... 640
R9 390X ...................... >1050/6000 MT/s ....................... 739
R9 Fury ....................... 1000/1000 MT/s ......................... 448
R9 Nano ..................... 1000/1000 MT/s .......................... 512
R9 Fury X ................... 1050/1000 MT/s .......................... 538
R9 Pro Duo ................ 1000/1000 MT/s (dual GPU) ....... 900
RX 470 ........................ 926-1206/6600 MT/s .................. 237
RX 480 ...................... 1120-1266/7000-8000 MT/s ........ 323
RX Vega 56 .............. 1600 MT/s .................................... 518-659
RX Vega 64 ...............1890 MT/s .................................... 638-792
RX Vega 64 Liquid .... 1890 MT/s ................................... 720-859
RX 580 ...................... 8000 MT/s ................................... 362-386

Wow, just noticed how feeble the entire R 400s line is at Double Precision!, even the top of the line (as of 12/16) RX 480 only manages 323 GFLOPs, which is a little less than the HD 5830s 358 from 2/2010 & only a bit more than the HD 4890 from 4/2009! Although it is more than the R9 380X's 248 GFLOPs :p.
I see I should use memory bandwidth rather than clockrate, it's misleading for the Vega's as they actually have much higher bandwidth than the 480/580s. The RX 580 is 256 GB/s, the Vega 56 410 GB/s!

I can see it won't be long before we have ancient 5800s, 6900s & 7900s (& 7870 XTs) as a secondary card in our rigs solely for crunching MW & Einstein, & modern cards for gaming & SP DC! ..........maybe I'm behind the times & some of you guys are already doing that!? ;)

* The 4800s can't run MW atm, see here

NVidia ...............................GPU/RAM ....................... DP GFLOPS
GTX 980 ................ 1126-1216 MHz/7010 MT/s .............. 144
GTX 980 Ti ............ 1000-1076 MHz/7010 MT/s .............. 176
GTX 1060 6GB ...... 1506-1708 MHz/8000 or 9000 MT/s 120-137
GTX 1070 .............. 1506-1683 MHz/8000 MT/s .............. 181-202
GTX 1080 .............. 1607-1733 MHz/10,000 MT/s ........... 257-277
GTX 2080 .............. 1515-1710 MHz/14,000 MT/s ........... 279-315
GTX 2080 Ti .......... 1350-1545 MHz/14,000 MT/s ........... 367-421

Why so few NVidia cards? Well traditionally AMD's cards had (& still have?) far better DP performance, but I will add more if they crop up more in the benchmarks, or if requested.
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#2
Carried on from old MW bench thread......
I don't see how a mere 5.5% boost in the GPU clock can account for 36% more performance, so the 33% faster CPU almost has to be the reason, especially considering the CPU time: Eleven seconds less CPU time, 14.5 seconds less task time......hmmmmm.
Yea, looks like the CPU does have a large influence now, interesting! I will have to note that in the new bench requirements.
So yea, your faster time does seem to be largely down to the faster CPU, I take it the driver update made no real difference?
Is the FX8320E running at its peak stock of 4 GHz?
 
Last edited:
Aug 2, 2003
3,132
261
136
#3
.....
Is the FX8320E running at its peak stock of 4 GHz?
Uhm....no. Seems locked at 3467MHz. Time to hook up a monitor and keyboard and go to the BIOS I think...

Usage is all over the place, from just a few percent to 100%, just running the GPU apps, and it's running a mere 32C, so no reason for it not to be maxing the turbo, or at least fluctuating.
 
Aug 2, 2003
3,132
261
136
#4
Still locked at 3467. This particular board fried a FX8350, and only reads 8G RAM no matter how much more I have in it. It has a limited future in my farm. :mad:
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#5
Lol, sounds like it'll either die or you're going to kill it! ;).

waffleironhead mentioned that their were large pauses within WUs, I wonder if some parts of the WU can only be done by a CPU & so periodically hands over the work to the CPU? That would explain the CPU affecting WU times & the fluctuating CPU load you're seeing.

Talking of waffleiron head, he updated me via PM confirming his HD 6950 is at stock clocks, & it's run with an Athlon2 X4 620 @2.6 GHz.
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#6
Well I was about to post a time for my 7870 XT 3GB, but the times are all over the place :confused:, ranging from 119-141s, although of the 8 validated so far range from 129-133s.
Looking at the GPU load whilst it's crunching, the load is very on/of, very weird! As well as the odd restarting of the WU as waffleironhead mentioned.

Just updated BOINC since doing the 1st 14 WUs on that rig to .33

[update]
Well originally I had SETI crunching on 3 cores & left 1 spare for MW GPU (always use to be enough), turns out it wasn't quite enough! (for consistent times anyway).
After suspending SETI, leaving all 4 cores for MW & looking at the task manager, system idle process fluctuates from 48-75%, although most of the time it is at 72-74%, when it briefly drops to ~50% the milkyway_1.43_win process is taking up a core for itself, for about a second roughly speaking.
I've crunched a few with only MW running now, going to switch SETI back on to 2 cores in a minute, shouldn't affect MW.....

Oh & re GPU load, rather than looking like a series of mountain ranges, now looks like a series of blocks in GPU-Z.

[update2]
And the times have plummeted! An average of 7 valids gives 56.8s!
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#7
Hmm, the mystery deepens, even with BOINC off 1 svchost process is taking 25% CPU time! Anyone know possibly why?
At 1st I thought it was BOINC related, but I just proved it isn't.......
No wonder I was having WU time variations with 3 cores to SETI!
 

waffleironhead

Diamond Member
Aug 10, 2005
6,486
3
101
#8

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#9
Ah ok, thanks mate :).

It seems even having 2 cores on SETI + the SVC host hog 1, was adding 2-3s/WU, just dropped SETI from 50 to 25% usage & MW WU times have gone from ~58s to ~56s (without this svchost issue, running SETI @50% would have no affect on MW WU time, & only minimal affect at 75%).
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#10
Looks like you may be affected by the windows 7 update bug. I too had a mysterious svchost eating up a whole core.
https://forums.anandtech.com/threads/fix-for-windows-7-update.2471653/
Hmm, tried it, didn't work, but interestingly when I switched the win updates off, the rogue svc didn't play up, but it came back when I switched updates back on.
For now I'm just going to leave it run, I saw another link in that thread referring to update problems, & sometimes win update taking upto 8hrs to sort itself out!
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#12
Thanks for the screenshot, but it seems the times are varying by quite a lot, has the GPU got a free CPU core for it?
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#14
Err, I see now that you've got 12 WUs being crunched at once! I take it you haven't got 12 GPUs? ;)
If not, (then as per my benchmark requirements), you need to crunch just 1 WU/GPU or it'll mess up the times.
 

waffleironhead

Diamond Member
Aug 10, 2005
6,486
3
101
#15
Err, I see now that you've got 12 WUs being crunched at once! I take it you haven't got 12 GPUs? ;)
If not, (then as per my benchmark requirements), you need to crunch just 1 WU/GPU or it'll mess up the times.
I think he has a 6 core I7 with hyperthread. ;) Those are the ~4000 second WU on his account methinks.
 

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#16
The app_config setting grabbed one of the hyper threads as a core. Saw that after the last post. Adjusted computing preference to get a full physical core free per card. So its correct now for the last hour or so with no difference in time.
 

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#17
Running a i7 5820-6 cores + hyper thread. Have 8 cpu tasks running now.
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#18
Cool & thanks :), great rig btw :D
From a visual average, I'd say your 980 is doing them in ~184s (feel free to calculate an average if you like).
What I don't understand is how some of the GPU WUs you did earlier were being done more quickly :confused:......

And am I right in saying your GTX 980 is clocked at 1.3/3 GHz?

Interesting thing happening with the CPU WU time too, I think that the ~4109s times are from cores that are also doing HT, whilst the 3529s ones are done from cores not currently HT, seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
Does that sound right? Does it work like that? lol
 
Last edited:

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#19
There was a thread over at the Milkyway forums back in Oct where they were discussing increasing the size of the gpu tasks so that there wouldn't be such a hit on the servers constantly. Maybe they got that done. When I get some time I'll see if I can't find it. And the lowest is in the 160's but 180's seem to be the avg.
 

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#20
Should have my 280x up and running in a few days. Missed the delivery Sat, and have to wait for the post office to open to pick it up.
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#21
Ahh that'll fly through MW WUs :D
Re longer WUs, yea makes sense.

What do you reckon about CPU HT question?
 

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#22
Cool & thanks :), great rig btw :D
From a visual average, I'd say your 980 is doing them in ~184s (feel free to calculate an average if you like).
What I don't understand is how some of the GPU WUs you did earlier were being done more quickly :confused:......

And am I right in saying your GTX 980 is clocked at 1.3/3 GHz?

Interesting thing happening with the CPU WU time too, I think that the ~4109s times are from cores that are also doing HT, whilst the 3529s ones are done from cores not currently HT, seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
Does that sound right? Does it work like that? lol
Correct, I have 2 physical free cores now, so that's a loss of 4 more possible tasks to get processed with HT on. So out of 6 cores, I have only 4 physically crunching, making 8 with HT on. Since there are no shifts in time by doing this. I may allow a 5th core to crunch, that will be the same as the original screen shot, since 1 free physical core is HT and getting used by the GPU as 2 cores with HT on.
The 980's usually run about 1350, as the heat comes up they will fluctuate based on my fan curve down to 1290, at least the top card that is, its usually 10-12C hotter than the lower card. Nothing is OC'd, just throttling based on temps.
 

iwajabitw

Senior member
Aug 19, 2014
812
10
106
#23
Looking at Lateralus, the main rig...CPU tasks are 1.42 and are 4000 seconds, GPU's are vs 1.43
 

StefanR5R

Platinum Member
Dec 10, 2016
2,258
343
106
#24
Running a i7 5820-6 cores + hyper thread. Have 8 cpu tasks running now.
seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
Correct, I have 2 physical free cores now, so that's a loss of 4 more possible tasks to get processed with HT on. So out of 6 cores, I have only 4 physically crunching, making 8 with HT on.
So you have 6 cores, 12 threads. The Windows process scheduler puts all runnable tasks all over the map. Since the Windows scheduler is HT-aware (i.e. knows which virtual CPUs are merely hardware-threads on the same physical CPU = same physical core), it tries to put concurrent processes onto different physical CPUs as long as possible.

If you have 8 CPU DC workers and 1 GPU DC worker (with its supporting CPU thread), then the Windows process scheduler needs to spread 8 processes with full load and 1 process with spiky load across those 12 virtual CPUs.

I.e. most of the time, you have 8 runnable processes. The scheduler will employ 4 cores with 1 process each, and 2 cores with 2 processes. (Which of the cores get just one process and which ones have to serve 2 processes will certainly change over time, since the Windows scheduler tends to shift processes from core to core unless the user or software requests core affinity.)

During the blips in time when the helper process of the GPU DC worker needs processing, you have not 8 but 9 runnable processes. So then the scheduler employs 3 cores with 1 process each, and 3 cores with 2 processes. From what I have seen, these are all low-priority processes, so it is impossible to say whether the GPU supporter is among the lucky three which get a whole physical core for themselves, or is among the 6 which need to share a core on which two threads are running at that time.

Long story short, with so many CPU DC workers active, the GPU DC supporter process has to fight with the CPU DC processes for CPU time.

I think there are two ways to ensure that the GPU worker is not held back by the CPU workers:
  1. Either reduce the number of CPU DC workers to 5 (i.e. to number of physical cores minus one).
  2. Or increase the scheduler priority of the GPU supporter process from low to normal. In the latter case, you can have as many low-priority processes from CPU DC workers as you like, but they will always yield a core (presumably without HT penalty) to the normally prioritized thread as soon as that one becomes runnable.
The second option can still be a little bit detrimental because it will involve more cache pressure than the first option.

That's at least my understanding of Windows process scheduling in general, and on hyperthreaded CPUs in particular. I am not at all a Windows expert though, am more of a Linux guy. (Need to cope with Windows at work, use Linux at home and occasionally at work.)
 

Assimilator1

Elite Member
Nov 4, 1999
22,942
19
91
#25
I was wondering if that might the case, dang.

iwajabitw
As I mentioned though, some CPU tasks were done in 3529s, I suppose the only real way to know non-HT times is to either disable it, or reduce CPU crunching to 1 thread.
 


ASK THE COMMUNITY

TRENDING THREADS