MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

Assimilator1 · Dec 29, 2016

So please share your new scores for old & new GPUs & CPUs alike!

[update 7/2023] MW's Seperation project ended this month, so only the CPU project is running now.

[update 10/2020] Seems that the 227.5x credits have been sticking around for a while, although their does seem to be some variability in WUs sometimes. So I'm collecting WU times again (see end of thread), just bear in mind the times might be more fuzzy than previous benchmarking runs.
CPUs don't consistently get the 227.5x WUs unless the MT app is disabled.
Thanks to biodoc regarding CPU benchmarking for this

:-

biodoc said:
It's the separation tasks that were used for bench-marking in the past. To get accurate benchmarks on the CPU separation tasks, I would suggest opening up your milkyway account preferences and uncheck the N-body simulation mt app and also the GPU separation app so you only get the CPU separation tasks. Running tasks from all 3 apps at the same time just adds too many variables.

I've added a table for running concurrent WUs as many people do that, & Nvidia cards in particular benefit from doing that. Note though that times from doing that can be more erratic than running singularly.

Requirements for the benchmark :-

Averaged time of at least 5 WUs for GPUs and an averaged time of at least 10 WUs for CPUs (not cherry picked please!

).

A dedicated physical CPU core for each GPU (for optimal MW WU times). If only using BOINC for CPU tasks, & you have an HT capable CPU, then the only way to be certain of this (bar disabling HT) is to set the BOINC computing preferences (in advanced mode>options) so that you have 2 less CPU threads running then you do physical cores. Don't panic too much about lost CPU ppd, it doesn't take long to run MW GPU WUs

(see table).

Please state what speed & type CPU you have, as it now has a significant affect on GPU WU times!

Please state GPU clock speeds if overclocked (including factory overclocks) or state 'stock'.

Please only crunch 1 WU at a time per GPU, preferably. Or if you are running concurrent WUs, state how many & I'll put your time in the 2nd GPU table.

For CPU times please state whether Hyper Threading (or equivalent) is enabled or not, times for both states welcomed.

It would also be useful if you could state your BOINC & driver version, & OS, in case it does make any difference.

If you find your WU times are fluctuating more than a couple of % for singly run WUs then use GPU-Z or your grx card driver tools to check that your GPU is able to hit near 100% load (although I'm not sure that Nvidia cards can hit that for MW). Note that even when crunching normally, the GPU load will be on/off on this current MW app, so the GPU load graph should look like a series of blocks. Just looking at my RX 580, it was going to zero load roughly every 27s.
Also check using task manager that your CPU does actually have the spare load to give to MW (& btw, GPU crunching won't show up in the TM).

(stupid forum s/w is now busted and won't let me make the names here clickable without linking everything in the following text!!!)

GPU statistics - Average Run Time to Complete a MW v1.46 227.5x credit WU :-

Firepro S9150, (CPU i3 4160) ............................................................................... 31.2s .... Icecold - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40165439
R280X, 900 MHz, (CPU Ryzen 9 3900X @~3.9 GHz)........................................... 47s ....... Fardringle - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40334785
HD7870 XT 3GB (DS), GPU 850 MHz, (CPU, i7 4930k @4.1 GHz, 6 core) ......... 68s ...... Assimilator1 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40317876
RX 6700 XT -6% power target, -25mV, (CPU Ryzen 9 5950X @~3 GHz) ........... 84s ....... Endgame124 - https://forums.anandtech.com/threads/6700xt-distributed-computing-results.2592770/post-40489586
RX 580 8GB, GPU 1266 MHz, (CPU Ryzen 5 3600 @~3.7 GHz) ......................... 96s ...... Assimilator1 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40317876
R7 250X (CPU A10 7870k) .................................................................................. 405s ..... Endgame124 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40419746
A10 7870k iGPU .................................................................................................. 472s ...... Endgame124 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40419746
Quadro 4000 (CPU dual Xeon 5680s) ................................................................ 481s ...... Fardringle - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40324040

GPU statistics - Average Run Time to Complete concurrent MW v1.46 227.5x credit WUs :-

Radeon VII Pro, (CPU Xeon E5-1620 v3 @ 3.50GHz), 4 concurrent WUs ................... 30s ........ Holdolin - https://hardforum.com/threads/welcome-new-h-members-to-dc.2006491/post-1044892382
R9 280X, 900 MHz, (CPU Ryzen 9 3900X, ~3.9 GHz), 2 concurrent WUs ................... 77.2s .... Fardringle
HD7870 XT 3GB (DS), GPU 925 MHz, (CPU, i7 4930k @4.1 GHz, 6 core), 2 WUs .... 140s ....... Assimilator1 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40322386
RX 580 8GB, GPU 1266 MHz, (CPU Ryzen 5 3600, ~3.7 GHz), 2 WUs ...................... 167s ....... Assimilator1 - https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40322386
Quadro 4000, (CPU dual Xeon 5680s), 2 WUs ............................................................ 799s ....... Fardringle https://forums.anandtech.com/thread...imes-wanted-for-new-wus.2495905/post-40324040

CPU statistics - Average Run Time to Complete a MW v1.46 227.5x credit WU :-

.....

************************************************************************************************************************************************************************************

Former benchmarking runs (different credit WUs and/or apps).

GPU statistics - Average Run Time to Complete 1 MW v1.46 227.62 credit WU :-

HD 7970, GPU 1200 MHz(!) (CPU, Xeon E5 ES 10 core @2.7 GHz, HT off) ..... 38.2s .... tictoc
R9 290, GPU 1000 MHz, (CPU, Xeon E5 ES 10 core @2.7 GHz, HT off) ........... 70.9s .... tictoc
HD 7870 XT 3GB(DS), GPU 925 MHz (CPU, C2 Q9550 @3.58 GHz) ................ 73.2s .... Assimilator1
RX 580 8GB, GPU 1350 MHz (CPU, i7 4930k @4.1 GHz) .................................. 97.3s .... Assimilator1
RTX 2080 Ti, GPU ???? MHz (CPU, i7-8700K @4.7 GHz no AVX) ................... 110.6s .... IEC
R7 iGPU on an AMD A12-9800 APU (CPU, 4.2 GHz) ....................................... 120.3s .... hoppisaur
RX 570, GPU stock (CPU, i7-4771 ?? GHz) ....................................................... 121s ....... Jim1348
Tesla T4, (CPU, ????) ......................................................................................... 151s ........ vseven

GPU statistics - Average Run Time to Complete multiple MW v1.46 227.62 credit WU :-

RX 570, GPU stock (CPU, i7-4771 ?? GHz) (2 concurrent WUs) ..................... 194s ....... Jim1348

CPU statistics - Average Run Time to Complete 1 MW v1.74 227.62 credit WU :-

**********************************************************************************************************************************
Since v1.46 was released on 1/5/17 (UK date format

), the WU times & credits changed. Times are apparently 'slightly longer' & the main WUs (99%+) & thus the new benchmark WU was 227.23 credits. See this post for more info. See below v1.46 table for the other benchmark requirements.
Btw, watch out for the 227.26 credit WUs, they are very rare (approx. 1% of WUs atm), but despite their tiny increase in credit they take about 5% longer, at least on my HD 7970, ~56s vs 53s.

GPU statistics ~ Average Run Time to Complete 1 MW v1.46 227.23 credit WU :-

R9 280X, GPU 1030 MHz (CPU, ???) ................................................................... 50.4s .... JoeM
HD 7970, GPU 1000 MHz (CPU, i7 4930k @4.1 GHz) ........................................ 53s ....... Assimilator1
Vega 56, stock (CPU, 2500k @4.3 GHz) .............................................................. 63s ....... Chooka
HD 6970, GPU 890 MHz (CPU, Phenom II X6 1090T, stock) .............................. 94s ....... Hassan Shebli
HD 6970, stock (CPU, ???????) .......................................................................... 107s ....... JoeM
RX 480 8GB, GPU o/c to? (CPU, Phenom II X6 1100T @?) .............................. 110s ....... Darrell
HD 5870, GPU 900 MHz, (CPU, ?????? ) ............................................................ 116s ....... JoeM
RX 470 4GB, GPU 1205 MHz (CPU, Phenom II 1100T, stock) .......................... 127s ....... [AF>HFR] Seeds
GTX 1070 Ti, GPU 2 GHz (CPU, Ryzen 1700X @3.9 GHz) ................................ 170s ...... Keith Myers
GTX 1060, stock (CPU, Pentium G3900) ........................................................... 250s ...... DVDL
HD 7750, stock (CPU, ? ) ..................................................................................... 647s ..... JoeM

CPU statistics ~ Average Run Time to Complete 1 MW v1.46 227.23 credit WU :-

Ryzen R7 1700X (8C, stock 3.4 GHz, RAM o/c 2667 MHz) .................................. 3315s no HT ... JoeM
Ryzen R7 1700X (8C, stock 3.4 GHz, RAM o/c 2667 MHz) .................................. 4428s HT on ... JoeM
8350 (7C, ?????) ...................................................................................................... 5105s .............. JoeM
8350 (7C, ?????) ...................................................................................................... 5388s .............. JoeM

**********************************************************************************************************

Old app GPU statistics ~ Average Run Time to Complete 1 MW v1.42 133.66 credit WU :-

HD 7970, GPU 1250 MHz (CPU, AMD R7 1700 @3.8 GHz) ................................ 32.1s ... tictoc
R9 280X, GPU 1080 MHz (CPU, Pentium G3220 @3 GHz) ................................. 40.1s ... Tennessee Tony
HD 7970, GPU 1000 MHz (CPU, i7 4930k @4.1 GHz) ......................................... 42s ...... Assimilator1
R9 280X, Stock (CPU, C2D E6550, stock) ............................................................. 54.3s ... iwajabitw
R9 280X, GPU 1020 MHz (CPU, AMD FX8320E @3.47 GHz) .............................. 54.8s ... Tennessee Tony
HD 7950, GPU 860 MHz (CPU, i7 3770k, stock) .................................................. 56.5s ... salvorhardin
HD 7870 XT 3GB(DS), GPU 925 MHz (CPU, C2 Q9550 @3.58 GHz) .................. 56.8s ... Assimilator1
R9 390, GPU 1015 MHz (CPU, i7 3770k, stock) ................................................... 60.7s ... salvorhardin
R9 Fury, GPU 1050 MHz (i7 5820k @4.4 GHz) .................................................... 65.9s ... crashtech
RX 480, GPU 1415 MHz, RAM 2025 MHz (CPU, i5 6600k, 4.6 GHz) .................. 72.1s ... TomTheMetalGod
HD 6950, stock (CPU Athlon2 X4 620 @2.6 GHz) ............................................. 101.2s ... waffleironhead
GTX 1080, GPU 2000 MHz (CPU, i7 6950X @4 GHz) ........................................ 116s ...... StefanR5R
GTX 980, GPU 1303 MHz (CPU, i7 5820k @3.3 GHz) ....................................... 184s ...... iwajabitw
RX 460, GPU 1244 MHz (CPU, i5 4460 @3.2 GHz) ............................................ 240.5s ... waffleironhead
Quadro K2100M, stock (CPU, i7 4900 MQ turbo @3.8 GHz) ............................ 1784s ...... StefanR5R

StefanR5R has posted a load of scores here. So if you're interested in scores for a Xeon E5-2690 v4, Phenom II X4 905e, Core 2 T7600, i7 6950X, i7 4960X, i7 4900MQ, GTX 1070, GTX 1080 (I put the highest clock score in the table above), & a Firepro W7000 then check out his very useful post!

Current CPU statistics ~ Average Run Time to Complete 1 MW v1.4x 133.66 credit WU :-

i7 5820k @3.3 GHz ......................................................................... 2723s no 'HT load' .... iwajabitw
i7 4930k @4.1 GHz (6 threads for CPU) ....................................... 2825s no 'HT load' .... Assimilator1
i7 4930k @4.1 GHz (10 threads for CPU, 2 for GPU).................... 4171s HT on .............. Assimilator1
I7 4930k @4.1 GHz (12 threads for CPU) ..................................... 4557s HT on .............. Assimilator1

***********************************************************************************************
Info:-

My previous MW benchmark thread spring 2014 - summer 2016

Stock clocks for some of the commonly used graphics cards for MW (& cards with good double precision power), source Wiki (GPU/RAM MHz or MT/s if stated) :-

AMD .............................GPU/RAM ................................... DP GFLOPS
HD 4890 ...................... 850/975 ....................................... 272*
HD 5830 ...................... 800/1000 ..................................... 358
HD 5850 ...................... 725/1000 ..................................... 418
HD 5870 ...................... 850/1200 ..................................... 544
HD 5970 ...................... 725/1000 (dual GPU) .................. 928
HD 6930 ...................... 750/1200 ..................................... 480
HD 6950 ...................... 800/1250 ..................................... 563
HD 6970 ...................... 880/1375 ..................................... 675
HD 6990 ...................... 830/1250 (dual GPU) ................ 1277
HD 7870 XT ................. 925-975/1500 ............................. 710-749
HD 7950 ...................... 800/1250 ..................................... 717
HD 7950 Boost ........... 850-925/1250 .............................. 762-829
HD 7970 ...................... 925/1375 ..................................... 947
HD 7970 GE ............... 1000-1050/1500 ......................... 1024-1075
HD 7990 ..................... 950-1000/1500 (dual GPU) ........ 1894-2048
R9 280 ........................ 827-933/1250 .............................. 741-836
R9 280X ...................... 850-1000/1500 ............................ 870-1024
R9 290 ........................ >947/5000 MT/s .......................... 606
R9 290X ...................... >1000/5000 MT/s ....................... 704
R9 295X2 .................... 1018/5000 MT/s (dual GPU) .... 1433
R9 390 ........................ >1000/6000 MT/s ....................... 640
R9 390X ...................... >1050/6000 MT/s ....................... 739
R9 Fury ....................... 1000/1000 MT/s ......................... 448
R9 Nano ..................... 1000/1000 MT/s .......................... 512
R9 Fury X ................... 1050/1000 MT/s .......................... 538
R9 Pro Duo ................ 1000/1000 MT/s (dual GPU) ....... 900
RX 470 ........................ 926-1206/6600 MT/s .................. 237
RX 480 ...................... 1120-1266/7000-8000 MT/s ........ 323
RX Vega 56 .............. b/w 410 GB/s ................................. 518-659
RX Vega 64 ...............1890 MT/s ...................................... 638-792
RX Vega 64 Liquid .... 1890 MT/s ..................................... 720-859
RX 580 ...................... b/w 256 GB/s ................................. 362-386
RX 5700 XT .............. 1605-1905 b/w 448 GB/s ............. 562

Wow, just noticed how feeble the entire R 400s line is at Double Precision!, even the top of the line (as of 12/16) RX 480 only manages 323 GFLOPs, which is a little less than the HD 5830s 358 from 2/2010 & only a bit more than the HD 4890 from 4/2009! Although it is more than the R9 380X's 248 GFLOPs

.
I see I should use memory bandwidth rather than clockrate, it's misleading for the Vega's as they actually have much higher bandwidth than the 480/580s. The RX 580 is 256 GB/s, the Vega 56 410 GB/s! (added).

I can see it won't be long before we have ancient 5800s, 6900s & 7900s (& 7870 XTs) as a secondary card in our rigs solely for crunching MW & Einstein, & modern cards for gaming & SP DC! ..........maybe I'm behind the times & some of you guys are already doing that!?

* The 4800s can't run MW atm, see here

NVidia ...............................GPU/RAM ....................... DP GFLOPS
GTX 980 ................ 1126-1216 MHz/7010 MT/s .............. 144
GTX 980 Ti ............ 1000-1076 MHz/7010 MT/s .............. 176
GTX 1060 6GB ...... 1506-1708 MHz/8000 or 9000 MT/s 120-137
GTX 1070 .............. 1506-1683 MHz/8000 MT/s .............. 181-202
GTX 1080 .............. 1607-1733 MHz/10,000 MT/s ........... 257-277
GTX 2080 .............. 1515-1710 MHz/14,000 MT/s ........... 279-315
GTX 2080 Ti .......... 1350-1545 MHz/14,000 MT/s ........... 367-421

Why so few NVidia cards? Well traditionally AMD's cards had (& still have?) far better DP performance, but I will add more if they crop up more in the benchmarks, or if requested.

Assimilator1 · Dec 29, 2016

Carried on from old MW bench thread......

TennesseeTony said:
I don't see how a mere 5.5% boost in the GPU clock can account for 36% more performance, so the 33% faster CPU almost has to be the reason, especially considering the CPU time: Eleven seconds less CPU time, 14.5 seconds less task time......hmmmmm.

Yea, looks like the CPU does have a large influence now, interesting! I will have to note that in the new bench requirements.
So yea, your faster time does seem to be largely down to the faster CPU, I take it the driver update made no real difference?
Is the FX8320E running at its peak stock of 4 GHz?

TennesseeTony · Dec 29, 2016

Assimilator1 said:
.....
Is the FX8320E running at its peak stock of 4 GHz?

Uhm....no. Seems locked at 3467MHz. Time to hook up a monitor and keyboard and go to the BIOS I think...

Usage is all over the place, from just a few percent to 100%, just running the GPU apps, and it's running a mere 32C, so no reason for it not to be maxing the turbo, or at least fluctuating.

TennesseeTony · Dec 29, 2016

Still locked at 3467. This particular board fried a FX8350, and only reads 8G RAM no matter how much more I have in it. It has a limited future in my farm.

Assimilator1 · Dec 30, 2016

Lol, sounds like it'll either die or you're going to kill it!

.

waffleironhead mentioned that their were large pauses within WUs, I wonder if some parts of the WU can only be done by a CPU & so periodically hands over the work to the CPU? That would explain the CPU affecting WU times & the fluctuating CPU load you're seeing.

Talking of waffleiron head, he updated me via PM confirming his HD 6950 is at stock clocks, & it's run with an Athlon2 X4 620 @2.6 GHz.

Assimilator1 · Dec 31, 2016

Well I was about to post a time for my 7870 XT 3GB, but the times are all over the place

, ranging from 119-141s, although of the 8 validated so far range from 129-133s.
Looking at the GPU load whilst it's crunching, the load is very on/of, very weird! As well as the odd restarting of the WU as waffleironhead mentioned.

Just updated BOINC since doing the 1st 14 WUs on that rig to .33

[update]
Well originally I had SETI crunching on 3 cores & left 1 spare for MW GPU (always use to be enough), turns out it wasn't quite enough! (for consistent times anyway).
After suspending SETI, leaving all 4 cores for MW & looking at the task manager, system idle process fluctuates from 48-75%, although most of the time it is at 72-74%, when it briefly drops to ~50% the milkyway_1.43_win process is taking up a core for itself, for about a second roughly speaking.
I've crunched a few with only MW running now, going to switch SETI back on to 2 cores in a minute, shouldn't affect MW.....

Oh & re GPU load, rather than looking like a series of mountain ranges, now looks like a series of blocks in GPU-Z.

[update2]
And the times have plummeted! An average of 7 valids gives 56.8s!

Assimilator1 · Dec 31, 2016

Hmm, the mystery deepens, even with BOINC off 1 svchost process is taking 25% CPU time! Anyone know possibly why?
At 1st I thought it was BOINC related, but I just proved it isn't.......
No wonder I was having WU time variations with 3 cores to SETI!

waffleironhead · Dec 31, 2016

Assimilator1 said:
Hmm, the mystery deepens, even with BOINC off 1 svchost process is taking 25% CPU time! Anyone know possibly why?
At 1st I thought it was BOINC related, but I just proved it isn't.......
No wonder I was having WU time variations with 3 cores to SETI!

Looks like you may be affected by the windows 7 update bug. I too had a mysterious svchost eating up a whole core.
https://forums.anandtech.com/threads/fix-for-windows-7-update.2471653/

Assimilator1 · Dec 31, 2016

Ah ok, thanks mate

.

It seems even having 2 cores on SETI + the SVC host hog 1, was adding 2-3s/WU, just dropped SETI from 50 to 25% usage & MW WU times have gone from ~58s to ~56s (without this svchost issue, running SETI @50% would have no affect on MW WU time, & only minimal affect at 75%).

Assimilator1 · Dec 31, 2016

waffleironhead said:
Looks like you may be affected by the windows 7 update bug. I too had a mysterious svchost eating up a whole core.
https://forums.anandtech.com/threads/fix-for-windows-7-update.2471653/

Hmm, tried it, didn't work, but interestingly when I switched the win updates off, the rogue svc didn't play up, but it came back when I switched updates back on.
For now I'm just going to leave it run, I saw another link in that thread referring to update problems, & sometimes win update taking upto 8hrs to sort itself out!

iwajabitw · Jan 1, 2017

First valids with 133.66

Assimilator1 · Jan 1, 2017

Thanks for the screenshot, but it seems the times are varying by quite a lot, has the GPU got a free CPU core for it?

iwajabitw · Jan 1, 2017

Yeah, 1 for each card.

Assimilator1 · Jan 1, 2017

Err, I see now that you've got 12 WUs being crunched at once! I take it you haven't got 12 GPUs?

If not, (then as per my benchmark requirements), you need to crunch just 1 WU/GPU or it'll mess up the times.

waffleironhead · Jan 1, 2017

Assimilator1 said:
Err, I see now that you've got 12 WUs being crunched at once! I take it you haven't got 12 GPUs?
If not, (then as per my benchmark requirements), you need to crunch just 1 WU/GPU or it'll mess up the times.

I think he has a 6 core I7 with hyperthread.

Those are the ~4000 second WU on his account methinks.

iwajabitw · Jan 1, 2017

The app_config setting grabbed one of the hyper threads as a core. Saw that after the last post. Adjusted computing preference to get a full physical core free per card. So its correct now for the last hour or so with no difference in time.

iwajabitw · Jan 1, 2017

Running a i7 5820-6 cores + hyper thread. Have 8 cpu tasks running now.

Assimilator1 · Jan 1, 2017

Cool & thanks

, great rig btw

From a visual average, I'd say your 980 is doing them in ~184s (feel free to calculate an average if you like).
What I don't understand is how some of the GPU WUs you did earlier were being done more quickly

......

And am I right in saying your GTX 980 is clocked at 1.3/3 GHz?

Interesting thing happening with the CPU WU time too, I think that the ~4109s times are from cores that are also doing HT, whilst the 3529s ones are done from cores not currently HT, seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
Does that sound right? Does it work like that? lol

iwajabitw · Jan 1, 2017

There was a thread over at the Milkyway forums back in Oct where they were discussing increasing the size of the gpu tasks so that there wouldn't be such a hit on the servers constantly. Maybe they got that done. When I get some time I'll see if I can't find it. And the lowest is in the 160's but 180's seem to be the avg.

iwajabitw · Jan 1, 2017

Should have my 280x up and running in a few days. Missed the delivery Sat, and have to wait for the post office to open to pick it up.

Assimilator1 · Jan 1, 2017

Ahh that'll fly through MW WUs

Re longer WUs, yea makes sense.

What do you reckon about CPU HT question?

iwajabitw · Jan 1, 2017

Assimilator1 said:
Cool & thanks , great rig btw
From a visual average, I'd say your 980 is doing them in ~184s (feel free to calculate an average if you like).
What I don't understand is how some of the GPU WUs you did earlier were being done more quickly ......

And am I right in saying your GTX 980 is clocked at 1.3/3 GHz?

Interesting thing happening with the CPU WU time too, I think that the ~4109s times are from cores that are also doing HT, whilst the 3529s ones are done from cores not currently HT, seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
Does that sound right? Does it work like that? lol

Correct, I have 2 physical free cores now, so that's a loss of 4 more possible tasks to get processed with HT on. So out of 6 cores, I have only 4 physically crunching, making 8 with HT on. Since there are no shifts in time by doing this. I may allow a 5th core to crunch, that will be the same as the original screen shot, since 1 free physical core is HT and getting used by the GPU as 2 cores with HT on.
The 980's usually run about 1350, as the heat comes up they will fluctuate based on my fan curve down to 1290, at least the top card that is, its usually 10-12C hotter than the lower card. Nothing is OC'd, just throttling based on temps.

iwajabitw · Jan 1, 2017

Looking at Lateralus, the main rig...CPU tasks are 1.42 and are 4000 seconds, GPU's are vs 1.43

StefanR5R · Jan 1, 2017

iwajabitw said:
Running a i7 5820-6 cores + hyper thread. Have 8 cpu tasks running now.

Assimilator1 said:
seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.

iwajabitw said:
Correct, I have 2 physical free cores now, so that's a loss of 4 more possible tasks to get processed with HT on. So out of 6 cores, I have only 4 physically crunching, making 8 with HT on.

So you have 6 cores, 12 threads. The Windows process scheduler puts all runnable tasks all over the map. Since the Windows scheduler is HT-aware (i.e. knows which virtual CPUs are merely hardware-threads on the same physical CPU = same physical core), it tries to put concurrent processes onto different physical CPUs as long as possible.

If you have 8 CPU DC workers and 1 GPU DC worker (with its supporting CPU thread), then the Windows process scheduler needs to spread 8 processes with full load and 1 process with spiky load across those 12 virtual CPUs.

I.e. most of the time, you have 8 runnable processes. The scheduler will employ 4 cores with 1 process each, and 2 cores with 2 processes. (Which of the cores get just one process and which ones have to serve 2 processes will certainly change over time, since the Windows scheduler tends to shift processes from core to core unless the user or software requests core affinity.)

During the blips in time when the helper process of the GPU DC worker needs processing, you have not 8 but 9 runnable processes. So then the scheduler employs 3 cores with 1 process each, and 3 cores with 2 processes. From what I have seen, these are all low-priority processes, so it is impossible to say whether the GPU supporter is among the lucky three which get a whole physical core for themselves, or is among the 6 which need to share a core on which two threads are running at that time.

Long story short, with so many CPU DC workers active, the GPU DC supporter process has to fight with the CPU DC processes for CPU time.

I think there are two ways to ensure that the GPU worker is not held back by the CPU workers:

Either reduce the number of CPU DC workers to 5 (i.e. to number of physical cores minus one).
Or increase the scheduler priority of the GPU supporter process from low to normal. In the latter case, you can have as many low-priority processes from CPU DC workers as you like, but they will always yield a core (presumably without HT penalty) to the normally prioritized thread as soon as that one becomes runnable.

The second option can still be a little bit detrimental because it will involve more cache pressure than the first option.

That's at least my understanding of Windows process scheduling in general, and on hyperthreaded CPUs in particular. I am not at all a Windows expert though, am more of a Linux guy. (Need to cope with Windows at work, use Linux at home and occasionally at work.)

Assimilator1 · Jan 1, 2017

I was wondering if that might the case, dang.

iwajabitw
As I mentioned though, some CPU tasks were done in 3529s, I suppose the only real way to know non-HT times is to either disable it, or reduce CPU crunching to 1 thread.

MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Diamond Member

Elite Member

Elite Member

Senior member

Elite Member

Senior member

Elite Member

Diamond Member

Senior member

Senior member

Elite Member

Senior member

Senior member

Elite Member

Senior member

Senior member

Elite Member

Elite Member