MilkyWay@H - Benchmark thread Winter 2016 on (different WU) - GPU & CPU times wanted

Discussion in 'Distributed Computing' started by Assimilator1, Dec 29, 2016.

  1. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    So please share your new scores for old & new GPUs & CPUs alike!

    Note new requirements for the benchmark :-

    Please use validated 133.66 credit WU results only, they must be from the MilkyWay@Home v1.4x app
    (Currently for the GPUs, there is only 1 app)

    Average of at least 5 WU times (not cherry picked please! ;)).

    A dedicated physical CPU core for each GPU (for optimal MW WU times). If only using BOINC for CPU tasks, & you have an HT capable CPU, then the only way to be certain of this (bar disabling HT) is to set the BOINC computing preferences (in advanced mode>options) so that you have 1 less CPU thread running then you do physical cores. Don't panic too much about lost CPU ppd, it doesn't take long to run MW GPU WUs ;) (see table).
    Please state what speed & type CPU you have, as it now seems to have a significant affect on GPU WU times!

    Please state GPU & RAM clock speeds if overclocked (including factory overclocks) or state 'stock'.

    Please only crunch 1 WU at a time per GPU, otherwise it will massively increase WU time! (even if it does increase output, the WU times seem to fluctuate much more than singly crunched WUs, so you can't just 1/2 the times either).
    [note, the following paragraph may no longer be relevant for v1.4x, time will tell] I've decided to relent a bit on this, but only for the GTX Titan as it can't achieve anywhere near full load with just 1 WU, I will add a proviso stating this by each Titan's score (which will be derived from the total time crunched, divided by the number of WUs being crunched simultaneously. 8 WUs at once seems to be the choice so far).

    For CPU times please state whether Hyper Threading (or equivalent) is enabled or not, times for both states welcomed :).

    It would also be useful if you could state your BOINC & driver version, & OS, incase it does make any difference.

    [This paragraph under review]
    If you find your WU times are fluctuating more than a couple of % then use GPU-Z or your grx card driver tools to check that you GPU is able to hit near 100% load (although I'm not sure that NVidia cards can hit that for MW), note that even when crunching normally, the GPU load will be on/off on this current MW app, so the GPU load graph should look like a series of blocks.
    Also check using task manager that your CPU does actually have the spare load to give to MW (& btw, GPU crunching won't show up in the TM).

    ***************************************************************************************************************

    Current GPU statistics ~ Average Run Time to Complete 1 MW v1.4x 133.66 credit WU :-

    R9 280X, GPU 1080 MHz (CPU, Pentium G3220 @3 GHz) ................................. 40.1s ... Tennessee Tony
    HD 7970, GPU 1000 MHz (CPU, i7 4930k @4.1 GHz) ......................................... 42s ...... Assimilator1
    R9 280X, Stock (CPU, C2D E6550, stock) ............................................................. 54.3s ... iwajabitw
    R9 280X, GPU 1020 MHz (CPU, AMD FX8320E @3.47 GHz) .............................. 54.8s ... Tennessee Tony
    HD 7950, GPU 860 MHz (CPU, i7 3770k, stock) .................................................. 56.5s ... salvorhardin
    HD 7870 XT 3GB(DS), GPU 925 MHz (CPU, C2 Q9550 @3.58 GHz) .................. 56.8s ... Assimilator1
    R9 390, GPU 1015 MHz (CPU, i7 3770k, stock) ................................................... 60.7s ... salvorhardin
    RX 480, GPU 1415 MHz, RAM 2025 MHz (CPU, i5 6600k, 4.6 GHz) .................. 72.1s ... TomTheMetalGod
    HD 6950, stock (CPU Athlon2 X4 620 @2.6 GHz) ............................................. 101.2s ... waffleironhead
    GTX 980, GPU 1303 MHz (CPU, i7 5820k @3.3 GHz) ........................................ 184s ...... iwajabitw
    Quadro K2100M, stock (CPU, i7 4900 MQ turbo @3.8 GHz) ......................... 1784s ...... StefanR5R

    Current CPU statistics ~ Average Run Time to Complete 1 MW v1.4x 133.66 credit WU :-

    i7 5820k @3.3 GHz ......................................................................... 2723s no 'HT load' .... iwajabitw
    i7 4930k @4.1 GHz (6 threads for CPU) ....................................... 2825s no 'HT load' ... Assimilator1
    i7 4930k @4.1 GHz (10 threads for CPU, 2 for GPU)................... 4171s HT on .............. Assimilator1
    I7 4930k @4.1 GHz (12 threads for CPU) ..................................... 4557s HT on .............. Assimilator1

    ****************************************************************************************************************
    Info:-

    My previous MW benchmark thread spring 2014 - summer 2016

    Stock clocks for some of the commonly used graphics cards for MW (& cards with good double precision power), source Wiki (GPU/RAM MHz or MT/s if stated) :-
    AMD .............................GPU/RAM ................................... DP GFLOPS
    HD 4890 ...................... 850/975 ....................................... 272*
    HD 5830 ...................... 800/1000 ..................................... 358
    HD 5850 ...................... 725/1000 ..................................... 418
    HD 5870 ...................... 850/1200 ..................................... 544
    HD 5970 ...................... 725/1000 (dual GPU) .................. 928
    HD 6930 ...................... 750/1200 ..................................... 480
    HD 6950 ...................... 800/1250 ..................................... 563
    HD 6970 ...................... 880/1375 ..................................... 675
    HD 6990 ...................... 830/1250 (dual GPU) ................ 1277
    HD 7870 XT ................. 925-975/1500 ............................. 710-749
    HD 7950 ...................... 800/1250 ..................................... 717
    HD 7950 Boost ........... 850-925/1250 ............................. 762-829
    HD 7970 ...................... 925/1375 ..................................... 947
    HD 7970 GE ............... 1000-1050/1500 ......................... 1024-1075
    HD 7990 ..................... 950-1000/1500 (dual GPU) ....... 1894-2048
    R9 280 ........................ 827-933/1250 .............................. 741-836
    R9 280X ...................... 850-1000/1500 ............................ 870-1024
    R9 290 ........................ >947/5000 MT/s .......................... 606
    R9 290X ...................... >1000/5000 MT/s ....................... 704
    R9 295X2 .................... 1018/5000 MT/s (dual GPU) .... 1433
    R9 390 ........................ >1000/6000 MT/s ....................... 640
    R9 390X ...................... >1050/6000 MT/s ....................... 739
    R9 Fury ....................... 1000/1000 MT/s ......................... 448
    R9 Nano ..................... 1000/1000 MT/s .......................... 512
    R9 FuryX .................... 1050/1000 MT/s .......................... 538
    R9 Pro Duo ................ 1000/1000 MT/s (dual GPU) ...... 900
    RX 470 ........................ 926-1206/6600 MT/s .................. 237
    RX 480 ...................... 1120-1266/7000-8000 MT/s ........ 323

    Wow, just noticed how feeble the entire R 400s line is at Double Precision!, even the top of the line (as of 12/16) RX 480 only manages 323 GFLOPs, which is a little less than the HD 5830s 358 from 2/10 & only a bit more than the HD 4890 from 4/09! Although it is more than the R9 380X's 248 GFLOPs :p.

    I can see it won't be long before we have ancient 5800s, 6900s & 7900s (& 7870 XTs) as a secondary card in our rigs solely for crunching MW & Einstein, & modern cards for gaming & SP DC! ..........maybe I'm behind the times & some of you guys are already doing that!? ;)

    * The 4800s can't run MW atm, see here

    NVidia ...............................GPU/RAM ....................... DP GFLOPS
    GTX 980 ................ 1126-1216 MHz/7010 MT/s .............. 144
    GTX 1070 .............. 1506-1683 MHz/8000 MT/s .............. 181-202
    GTX 1080 .............. 1607-1733 MHz/10,000 MT/s ........... 257-277
     
    #1 Assimilator1, Dec 29, 2016
    Last edited: Jan 23, 2017
  2. Loading...

    Similar Threads - MilkyWay@H Benchmark thread Forum Date
    Returning to F@H Distributed Computing Saturday at 2:04 PM
    Is there a multi-computer tracking for WCG like HFM.net is to F@H ?? Distributed Computing Feb 18, 2017
    I just hit 2 BILLION on F@H Distributed Computing Jan 19, 2017
    FYI: Milkyway Challenge Jan 10-13 Distributed Computing Dec 28, 2016
    MilkyWay@H Benchmark thread Spring 2014 - Summer 2016 Distributed Computing May 28, 2014

  3. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Carried on from old MW bench thread......
    Yea, looks like the CPU does have a large influence now, interesting! I will have to note that in the new bench requirements.
    So yea, your faster time does seem to be largely down to the faster CPU, I take it the driver update made no real difference?
    Is the FX8320E running at its peak stock of 4 GHz?
     
    #2 Assimilator1, Dec 29, 2016
    Last edited: Dec 29, 2016
  4. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    1,763
    Likes Received:
    291
    Uhm....no. Seems locked at 3467MHz. Time to hook up a monitor and keyboard and go to the BIOS I think...

    Usage is all over the place, from just a few percent to 100%, just running the GPU apps, and it's running a mere 32C, so no reason for it not to be maxing the turbo, or at least fluctuating.
     
  5. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    1,763
    Likes Received:
    291
    Still locked at 3467. This particular board fried a FX8350, and only reads 8G RAM no matter how much more I have in it. It has a limited future in my farm. :mad:
     
  6. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Lol, sounds like it'll either die or you're going to kill it! ;).

    waffleironhead mentioned that their were large pauses within WUs, I wonder if some parts of the WU can only be done by a CPU & so periodically hands over the work to the CPU? That would explain the CPU affecting WU times & the fluctuating CPU load you're seeing.

    Talking of waffleiron head, he updated me via PM confirming his HD 6950 is at stock clocks, & it's run with an Athlon2 X4 620 @2.6 GHz.
     
  7. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Well I was about to post a time for my 7870 XT 3GB, but the times are all over the place :confused:, ranging from 119-141s, although of the 8 validated so far range from 129-133s.
    Looking at the GPU load whilst it's crunching, the load is very on/of, very weird! As well as the odd restarting of the WU as waffleironhead mentioned.

    Just updated BOINC since doing the 1st 14 WUs on that rig to .33

    [update]
    Well originally I had SETI crunching on 3 cores & left 1 spare for MW GPU (always use to be enough), turns out it wasn't quite enough! (for consistent times anyway).
    After suspending SETI, leaving all 4 cores for MW & looking at the task manager, system idle process fluctuates from 48-75%, although most of the time it is at 72-74%, when it briefly drops to ~50% the milkyway_1.43_win process is taking up a core for itself, for about a second roughly speaking.
    I've crunched a few with only MW running now, going to switch SETI back on to 2 cores in a minute, shouldn't affect MW.....

    Oh & re GPU load, rather than looking like a series of mountain ranges, now looks like a series of blocks in GPU-Z.

    [update2]
    And the times have plummeted! An average of 7 valids gives 56.8s!
     
    #6 Assimilator1, Dec 31, 2016
    Last edited: Dec 31, 2016
  8. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Hmm, the mystery deepens, even with BOINC off 1 svchost process is taking 25% CPU time! Anyone know possibly why?
    At 1st I thought it was BOINC related, but I just proved it isn't.......
    No wonder I was having WU time variations with 3 cores to SETI!
     
  9. waffleironhead

    waffleironhead Diamond Member

    Joined:
    Aug 10, 2005
    Messages:
    6,484
    Likes Received:
    36
    Looks like you may be affected by the windows 7 update bug. I too had a mysterious svchost eating up a whole core.
    https://forums.anandtech.com/threads/fix-for-windows-7-update.2471653/
     
    Assimilator1 likes this.
  10. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Ah ok, thanks mate :).

    It seems even having 2 cores on SETI + the SVC host hog 1, was adding 2-3s/WU, just dropped SETI from 50 to 25% usage & MW WU times have gone from ~58s to ~56s (without this svchost issue, running SETI @50% would have no affect on MW WU time, & only minimal affect at 75%).
     
  11. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Hmm, tried it, didn't work, but interestingly when I switched the win updates off, the rogue svc didn't play up, but it came back when I switched updates back on.
    For now I'm just going to leave it run, I saw another link in that thread referring to update problems, & sometimes win update taking upto 8hrs to sort itself out!
     
  12. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    First valids with 133.66

    [​IMG]
     
  13. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Thanks for the screenshot, but it seems the times are varying by quite a lot, has the GPU got a free CPU core for it?
     
  14. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    Yeah, 1 for each card.
     
  15. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Err, I see now that you've got 12 WUs being crunched at once! I take it you haven't got 12 GPUs? ;)
    If not, (then as per my benchmark requirements), you need to crunch just 1 WU/GPU or it'll mess up the times.
     
  16. waffleironhead

    waffleironhead Diamond Member

    Joined:
    Aug 10, 2005
    Messages:
    6,484
    Likes Received:
    36
    I think he has a 6 core I7 with hyperthread. ;) Those are the ~4000 second WU on his account methinks.
     
  17. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    The app_config setting grabbed one of the hyper threads as a core. Saw that after the last post. Adjusted computing preference to get a full physical core free per card. So its correct now for the last hour or so with no difference in time.
     
  18. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    Running a i7 5820-6 cores + hyper thread. Have 8 cpu tasks running now.
     
  19. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Cool & thanks :), great rig btw :D
    From a visual average, I'd say your 980 is doing them in ~184s (feel free to calculate an average if you like).
    What I don't understand is how some of the GPU WUs you did earlier were being done more quickly :confused:......

    And am I right in saying your GTX 980 is clocked at 1.3/3 GHz?

    Interesting thing happening with the CPU WU time too, I think that the ~4109s times are from cores that are also doing HT, whilst the 3529s ones are done from cores not currently HT, seeing as you have 12 possible threads but have 8 crunching, so 2 cores are not using HT.
    Does that sound right? Does it work like that? lol
     
    #18 Assimilator1, Jan 1, 2017
    Last edited: Jan 1, 2017
  20. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    There was a thread over at the Milkyway forums back in Oct where they were discussing increasing the size of the gpu tasks so that there wouldn't be such a hit on the servers constantly. Maybe they got that done. When I get some time I'll see if I can't find it. And the lowest is in the 160's but 180's seem to be the avg.
     
  21. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    Should have my 280x up and running in a few days. Missed the delivery Sat, and have to wait for the post office to open to pick it up.
     
  22. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Ahh that'll fly through MW WUs :D
    Re longer WUs, yea makes sense.

    What do you reckon about CPU HT question?
     
  23. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    Correct, I have 2 physical free cores now, so that's a loss of 4 more possible tasks to get processed with HT on. So out of 6 cores, I have only 4 physically crunching, making 8 with HT on. Since there are no shifts in time by doing this. I may allow a 5th core to crunch, that will be the same as the original screen shot, since 1 free physical core is HT and getting used by the GPU as 2 cores with HT on.
    The 980's usually run about 1350, as the heat comes up they will fluctuate based on my fan curve down to 1290, at least the top card that is, its usually 10-12C hotter than the lower card. Nothing is OC'd, just throttling based on temps.
     
  24. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    523
    Likes Received:
    81
    Looking at Lateralus, the main rig...CPU tasks are 1.42 and are 4000 seconds, GPU's are vs 1.43
     
  25. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    320
    Likes Received:
    135
    So you have 6 cores, 12 threads. The Windows process scheduler puts all runnable tasks all over the map. Since the Windows scheduler is HT-aware (i.e. knows which virtual CPUs are merely hardware-threads on the same physical CPU = same physical core), it tries to put concurrent processes onto different physical CPUs as long as possible.

    If you have 8 CPU DC workers and 1 GPU DC worker (with its supporting CPU thread), then the Windows process scheduler needs to spread 8 processes with full load and 1 process with spiky load across those 12 virtual CPUs.

    I.e. most of the time, you have 8 runnable processes. The scheduler will employ 4 cores with 1 process each, and 2 cores with 2 processes. (Which of the cores get just one process and which ones have to serve 2 processes will certainly change over time, since the Windows scheduler tends to shift processes from core to core unless the user or software requests core affinity.)

    During the blips in time when the helper process of the GPU DC worker needs processing, you have not 8 but 9 runnable processes. So then the scheduler employs 3 cores with 1 process each, and 3 cores with 2 processes. From what I have seen, these are all low-priority processes, so it is impossible to say whether the GPU supporter is among the lucky three which get a whole physical core for themselves, or is among the 6 which need to share a core on which two threads are running at that time.

    Long story short, with so many CPU DC workers active, the GPU DC supporter process has to fight with the CPU DC processes for CPU time.

    I think there are two ways to ensure that the GPU worker is not held back by the CPU workers:
    1. Either reduce the number of CPU DC workers to 5 (i.e. to number of physical cores minus one).
    2. Or increase the scheduler priority of the GPU supporter process from low to normal. In the latter case, you can have as many low-priority processes from CPU DC workers as you like, but they will always yield a core (presumably without HT penalty) to the normally prioritized thread as soon as that one becomes runnable.
    The second option can still be a little bit detrimental because it will involve more cache pressure than the first option.

    That's at least my understanding of Windows process scheduling in general, and on hyperthreaded CPUs in particular. I am not at all a Windows expert though, am more of a Linux guy. (Need to cope with Windows at work, use Linux at home and occasionally at work.)
     
    Assimilator1 likes this.
  26. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    I was wondering if that might the case, dang.

    iwajabitw
    As I mentioned though, some CPU tasks were done in 3529s, I suppose the only real way to know non-HT times is to either disable it, or reduce CPU crunching to 1 thread.