MilkyWay@H - Benchmark thread Winter 2016 on (different WU) - GPU & CPU times wanted

Discussion in 'Distributed Computing' started by Assimilator1, Dec 29, 2016.

  1. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    Hmm, on second thought: A high- or normal-priority task X will preempt any number of low-priority tasks Y at the level of logical cores, but not necessarily at the level of physical cores. I.e. if you have some physical cores which serve just one thread, and other physical cores which serve two threads, then I don't know whether the Windows process scheduler is clever enough to put higher-priority tasks onto the less utilized physical cores.
     
  2. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    So basically it sounds like, if any cpu tasks are running windows will distribute it across all cores., regardless of how boinc is set up or your config is specified. By freeing cores up, what is actually happening is a lowering of total overhead put to the cpus. Gpu crunching only, is the only way to truly make sure there is a core free, but windows will still be sharing resources across all cores. Interesting.
     
  3. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    Well, I do wonder whether there aren't ways to optimize this via app_config.xml or something.
     
  4. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    Here are results from AMD FirePro W7000@stock, hosted by an i7-4960X@4.5 GHz:
    193 s GPU time, 19 s CPU time.

    Detailed info:

    The W7000 is a GCN 1.0 card with Pitcairn XT GL with 950 MHz cores (no boost available), 1200 MHz memory.
    For comparison, consumer cards with Pitcairn XT are:
    Radeon HD 7870 GHz Edition (1000 MHz cores, no boost available, 1200 MHz memory)
    Radeon R9 270 (900 MHz cores, 925 MHz boost, 5600 MT/s memory)
    Radeon R9 270X (1000 MHz cores, 925 MHz boost, 5600 MT/s memory)
    Source: https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units

    I launched 73 MilkyWay@Home tasks last night with the following return at this time:
    In progress (0) · Validation pending (0) · Validation inconclusive (11) · Valid (60) · Invalid (1) · Error (1)

    I switched my tasks listing to show only the 60 valid ones, copy-pasted them as 3 tables with 20 rows into Excel, and let Excel calculate the average etc.:
    Code:
              latest 20           next 20             earliest 20
              GPU time  CPU time  GPU time  CPU time  GPU time  CPU time
              (s)       (s)       (s)       (s)       (s)       (s)
    --------------------------------------------------------------------
    average   193.1     18.6      183.5     18.8      182.9     19.2
    median    194.1     18.7      179.4     18.9      179.2     19.1
    max       194.2     19.3      200.0     20.7      259.5     21.7
    min       179.1     17.7      147.4     17.4      146.4     14.6
    stddev      3.3      0.5       11.2      0.7       19.8      1.3
    
    The "latest" series happened mostly after I had left the PC alone. During the "next" and "earliest", I was still doing some light interactive work on the PC. The 4960X had EIST off and HT on, and I had no long-running CPU task up while I did the GPU test, IOW the CPU was idle.

    Screenshots of the mw@h results pages:
    [​IMG]
    [​IMG]
    [​IMG]
     
    TennesseeTony likes this.
  5. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    PS:
    The 1 invalid result may have been from a hang due to me installing a GPU-Z update. The 1 error was me aborting an n-body CPU work unit right after I added the mw@h project on that PC.

    A few results at the start of the test were with a driver from early 2016; most other with "Radeon Pro Software Crimson ReLive Edition" 16.12.1, 12/12/2016.
     
  6. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Further to Stefan's post, basically we can't be sure that the ~3500s CPU scores are WUs that have been solely crunched on a physical core, which in thinking about it, is why in the past we have disabled HT to test non HT times.

    Regarding GPU times

    My HD 7970 with 1 'spare' CPU core for it, achieved via setting BOINC to 85% of CPUs (upto 91% for a 6C HTCPU, should on the face of it leave 1 spare core (real+HT) for the GPU) running 10 CPU threads.
    Anyway, with that set up I get an average WU time of 44.86s over 20 WUs.
    Setting BOINC to 45% so that there are just 5 CPU threads (guaranteeing a physical CPU core for the GPU) looking at results (from 13:48 on) gives an average time of ~42s (tbc) - 41.95s from 20 results (my estimate was close :)).
    Well it seems you're right Stefan, not only has GPU times dropped a bit, but I noticed that CPU run time for the GPU WUs has dropped from 13-14s to 10-11s.
    Time to re-write some of the benchmark requirements!

    Talking of CPU times, my i7 4930k @4.1 GHz (HT on) is doing WUs in 4171s. [update] that was with 10 threads, with 2 spare for the GPU, hmm..... I wonder if that means some WUs were partly crunched on the spare core when the GPU didn't need it much, or at least the HT load was more spread across the cores........possibly I guess, damn, I'll run some more later with 12 threads & no GPU crunching.

    If so that would also mean iwajabitw CPU 'HT on' score isn't valid either.
    Time will tell, just running 6 CPU threads atm for a non HT time.
     
    #31 Assimilator1, Jan 2, 2017
    Last edited: Jan 2, 2017
  7. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    StefanR5R
    Just seen your result screenshots, thanks :), & using excel for calculating averages, very professional! :)

    Looking at your results, on the bottom table most WUs are done in 179s ish (with 1 at 146s!), the next 1 up seems to be a mixture of mostly ~179s & ~194s results, with the top table being nearly all 193-194s results.
    Very odd, did you change GPU clocks at all?
    You said the CPU was idle, so it can't be that, I assume you didn't change the CPU clock either.

    If it's none of the above then maybe the new 133.66 credit MW WUs have different WUs within them giving different times! Which would obviously torpedo any attempts to use them as a benchmark! :(
    MW@H has done that before too, in my 1st MW benchmark thread (Feb-May 2014), we used to use the mod fit 213.76 credit WUs, then out of the blue, after a few weeks using it with consistent results, the times suddenly halved! At least then the time difference was obvious, I ended up starting a new benchmark thread because of it though.
     
  8. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Except if you have less CPU threads than you have physical cores, then there has to be a free physical core, right?
     
  9. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    I would have thought so, but as I am playing with app_config grabbing 3 cpu cores per card.
    <app_config>
    <app>
    <name>milkyway</name>
    <max_concurrent>0</max_concurrent>
    <gpu_versions>
    <gpu_usage>1</gpu_usage>
    <cpu_usage>3</cpu_usage>
    </gpu_versions>
    </app>
    </app_config>

    HWMonitor still shows pretty equal utilization of all cores. Impact in task manager is only showing cpu usage at 38% total. So windows is definantly spreading things out.
     
  10. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Right, ok, then your cpu score earlier probably isn't right, I thought it was odd your CPU was so much quicker than mine! ;) (even accounting for your 1 being the next gen on).
     
  11. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    1927545235 1408006894 2 Jan 2017, 4:39:18 UTC 2 Jan 2017, 15:38:34 UTC Completed and validated 2,798.63 2,794.39 133.66 MilkyWay@Home v1.42

    1927543868 1407960476 2 Jan 2017, 4:37:41 UTC 2 Jan 2017, 15:12:48 UTC Completed and validated 3,542.56 3,530.75 133.66 MilkyWay@Home v1.42

    1927544397 1408119645 2 Jan 2017, 4:37:41 UTC 2 Jan 2017, 15:11:11 UTC Completed and validated 3,548.42 3,536.00 133.66 MilkyWay@Home v1.42

    1927542947 1403865764 2 Jan 2017, 4:36:05 UTC 2 Jan 2017, 14:48:37 UTC Completed and validated 3,959.15 3,950.80 133.66 MilkyWay@Home v1.42

    It may be like you said earlier, things are just all over the place to benchmark, are there different work unit sizes or something? I don't know. Look at the difference in these cpu only tasks. Those are the last 4 that validated, had to go back 5 pages to find them. 2700 up to 4000?
     
  12. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    That's what I was talking about earlier, & I wonder if the faster times were due to some cores having little or no HT to do.
    I'm running just 6/12 threads atm on my rig (so not HT load), all WUs submitted since ~14:30 (about when I switched to 6 threads) have had times in to 2800-2900s range.
    There's a bunch of WUs upto about an hr prior (not looked earlier) with times roughly from 3800-3900s, this was when MW ran out of WUs for the GPU & I had 10 threads running, so 10 threads were spread over 12 'cores', I believe that's why their times are lower.
    Prior to the GPU running out of WUs, with 10 threads I was getting ~ 4150-4200s.

    Atm I'm not convinced the variation in WU times is down to variable WUs, but down to variable CPU load.
    I'm testing this atm, my next test will be with 12 threads & still no GPU crunching. I welcome all to test this! :)
    Ok, with 6 threads I get an average (from 10 valids) of 2825s, at some point I will do a test actually disabling HT in the bios to see if that makes any difference.

    What times do you get if you do no GPU crunching with 6 threads (no real HT load), then 12 threads (with HT load)?
     
  13. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    Yes, it's odd, and I have no ready explanation myself.

    The "latest 20" have a lot lower standard deviation than the earlier two sets of 20. This at least coincides nicely with the fact that I interacted with the PC during the "earliest 20" and "next 20", but left the PC alone since about the beginning of the "latest 20".

    But the fact that the "earliest 20" and "next 20" both had a better average and better minimums is contrary to that. I do have a screensaver on that PC which must have been active during the "latest 20". So far I assumed that this screensaver (called Flurry) consumes negligible GPU resources. But I am not sure.

    AFAIK, the W7000 does not allow for manipulation of GPU clock. At Ieast I never tried. Neither does a particular GPU workload influence GPU clock, apart from the drastic step between active clock and idle clock. When used, the GPU strictly stays at its 950 MHz. There is neither a turbo nor thermal throttling.¹ When unused, it drops down to idle clock whose precise value I don't remember right now. During MW@H runs (and before that, in F@H), I have never seen the clock deviate from 950 MHz. The analogue is true for memory clock and PCIe clock.

    I always ran only one WU at a time as per your OP instructions.

    The maximum outlier of 259.5 seconds during the "earliest 20" may have been due to an initial overlap with F@H finishing up while I added primegrid on that PC.

    CPU clock was constant the entire time too. I pegged it at the mentioned 4.5 GHz via BIOS settings. No EIST, no load-dependent turbo, just 4.5 GHz all the time, and always several physical cores idle.

    So, the screensaver and WU differences appear to be the most obvious possible causes for the observed variance. If I only had remembered to switch off the screensaver before benchmarking... Maybe I give it another try sometime soon.

    --------
    ¹) I forgot to disclose that this W7000 is customized. It lost its single-slot cooler already after a few hours use and received a triple-slot open-air cooler. The result of this was twofold: No more nasty fan noises, and always moderate temperatures even under Furmark. I do not remember whether or not the original single-slot cooler was able to keep the card below temperature limit under Furmark or other sustained load. I suspect that at least a high-airflow workstation case would keep an unmodified W7000 working at nominal frequency under compute loads like MW@H, but maybe not with a power virus like Furmark.
     
  14. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Sounds like your 7000s new cooler is much better :).

    It does seem likely your screensaver could have taken some GPU, maybe enough to give the variance, if you could retest that would be handy, maybe 6 with it on & 6 off?
    Btw when does your scrnsaver kick in?

    (note to self, started 12 MW threads @ 19:38. 10 threads from 00:14)
     
    #39 Assimilator1, Jan 2, 2017
    Last edited: Jan 2, 2017
  15. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    OK, having let the app_config above crunch while I have been working. I set under computing preferences, allow 84% of CPU's earlier today. With HT ON I am crunching 4 total CPU 1.42 tasks along with the GPU 1.43 tasks. Times have dropped down to 2700 seconds for CPU only tasks, don't know if this will be the norm, but that's a big boost, granted its only 4 tasks at once, vs the 8-10 yesterday.

    1927704627 1408233917 2 Jan 2017, 9:12:47 UTC 2 Jan 2017, 23:29:55 UTC Completed and validated 2,721.19 2,717.66 133.66 MilkyWay@Home v1.42

    1927704949 1408231744 2 Jan 2017, 9:12:47 UTC 2 Jan 2017, 23:29:55 UTC Completed and validated 2,725.20 2,720.41 133.66 MilkyWay@Home v1.42


    1927702723 1408234218 2 Jan 2017, 9:09:33 UTC 2 Jan 2017, 23:12:15 UTC Completed and validated 2,720.28 2,717.38 133.66 MilkyWay@Home v1.42
     
  16. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    I will check the W7000 again one of these days; not sure when.

    In the meantime, results from a laptop:
    NVIDIA Quadro K2100M, stock: 1784 s
    hosted by i7-4900MQ, stock

    GPU frequencies during the test: core 666 MHz :smilingimp:, mem 753 MHz
    CPU was at mere 800 MHz during most of the test
    Windows 7, NVIDIA driver 354.74

    Results from 19 validated tasks:
    Code:
              GPU time  CPU time
              (s)       (s)
    ----------------------------
    average   1784      15.0
    median    1783      14.5
    max       1786      17.9
    min       1782      13.4
    stddev     1.4       1.2
    
    (Edit:)
    Instead of copy&pasting the entire table of the tasks, here is the link to the computer:
    https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=715729
     
    #41 StefanR5R, Jan 3, 2017
    Last edited: Jan 3, 2017
  17. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    TennesseeTony likes this.
  18. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    That explains the progress bar reset then! Thanks :)

    Averaging these 8 of your results (2755, 2709, 2732, 2724, 2721, 2712s, 2721, 2713s , rounded to the nearest s) of your 4 thread CPU tasks gives 2723s.
    (Interesting to see your GPU settling down to 181-182s btw).

    Amazing how much quicker than mine your CPU is, despite an 800 MHz disadvantage!! Lol, and it's only 1 generation on. Hmm, I didn't recall that much of an advantage in AnandTech's reviews, & they indeed only got a 13-18% improvement on FPU heavy apps (which I assume MW is). I guess something else could be coming into play as well as better IPC, cache size perhaps, or maybe it's better memory latency, or likely both.
    Your 5820k is running at 3.3 GHz right? ;)

    Ok, with 12 threads I get an average WU time of 4557s over 17 WUs on my i7 4930k @4.1 GHz. Nearly 400s longer than 10 threads + GPU crunching, quite a bit longer!

    What do you get with 12 CPU threads & no GPU crunching?

    Stefan
    Cheers for the laptop time :), will add that to the table.
    800 MHz seems very low for the CPU though, was that with the charger plugged in?
     
    #43 Assimilator1, Jan 3, 2017
    Last edited: Jan 3, 2017
  19. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    Yep, this was on mains power. Apparently the CPU helper thread had so little to do that the CPU remained at the idle frequency most of the time. The thread showed up with rather low utilization in task manager when I watched. But hwinfo64's history showed that there were spikes up to 3.8 GHz, which is the single core turbo.
     
  20. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    I could test GTX 1070 and 1080 (which too have weak FP64), but they are still sitting together in a single PC, and I have not yet researched whether it is possible to peg boinc to a single card. And there are a few CPUs to test, but I'd rather postpone this to after January 13.
     
  21. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Re laptop CPU usage, curious, especially odd considering that CPU speed, or type at least, seems to affect GPU result times so much.
    Maybe CPU speed spiking at 3.8 GHz is enough to feed MW without hampering times, although the only way to know would be to lock it at that speed.

    Would be interesting to add the latest NVidia 1000 series cards to the table, no hurry on when, just do it when you want to :).
    Btw, afraid I don't know about BOINC & whether it is possible to lock it to a single card, I would of thought so, but I don't know it for a fact.
     
  22. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    503
    Likes Received:
    181
    My guess is that this mobile GPU is so small and slow, that CPU speed does not have as much impact as with more reasonably sized GPUs.

    I would have liked to control the CPU frequency better, but this being a laptop, I am at a loss what the BIOS etc. is doing and how to influence it in a repeatable way.
     
  23. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    Sorry, been out. 280x came in but the system I put it on is much slower than I thought, so I can only do one task at a time. Monitoring it to see how it goes.
     
  24. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    This is a Core2Duo E6550 on a Asus mobo MX1333, 32bit install of Vista with 2gb memory, MSI R9 280x Gpu. Preferences set to 50% of CPUs', app_config set for 1 cpu, one gpu, so no cpu tasks running.


    1930098169 1409312601 5 Jan 2017, 2:48:14 UTC 5 Jan 2017, 4:04:00 UTC Completed and validated 53.77 25.30 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930096392 1409316120 5 Jan 2017, 2:45:00 UTC 5 Jan 2017, 4:00:47 UTC Completed and validated 55.62 25.27 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930096201 1409108710 5 Jan 2017, 2:45:00 UTC 5 Jan 2017, 4:00:47 UTC Completed and validated 56.64 25.27 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930094208 1409158476 5 Jan 2017, 2:41:46 UTC 5 Jan 2017, 3:55:59 UTC Completed and validated 55.68 26.58 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930092637 1409315922 5 Jan 2017, 2:40:14 UTC 5 Jan 2017, 3:55:59 UTC Completed and validated 54.55 25.26 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930091251 1409257506 5 Jan 2017, 2:37:00 UTC 5 Jan 2017, 3:52:44 UTC Completed and validated 54.55 25.71 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930088218 1409322521 5 Jan 2017, 2:33:40 UTC 5 Jan 2017, 3:49:30 UTC Completed and validated 54.05 24.74 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930089251 1409311776 5 Jan 2017, 2:33:40 UTC 5 Jan 2017, 3:49:30 UTC Completed and validated 54.54 25.71 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930088326 1409267115 5 Jan 2017, 2:32:03 UTC 5 Jan 2017, 3:47:53 UTC Completed and validated 54.04 24.49 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930088156 1409273599 5 Jan 2017, 2:32:03 UTC 5 Jan 2017, 3:47:53 UTC Completed and validated 54.54 24.80 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930087477 1408857327 5 Jan 2017, 2:30:26 UTC 5 Jan 2017, 3:46:16 UTC Completed and validated 54.04 24.37 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930086377 1409311441 5 Jan 2017, 2:28:49 UTC 5 Jan 2017, 3:44:39 UTC Completed and validated 54.58 24.91 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930084450 1409296092 5 Jan 2017, 2:25:36 UTC 5 Jan 2017, 3:41:25 UTC Completed and validated 54.54 24.85 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930084238 1409286076 5 Jan 2017, 2:25:36 UTC 5 Jan 2017, 3:41:25 UTC Completed and validated 54.04 24.15 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930083864 1409317211 5 Jan 2017, 2:23:58 UTC 5 Jan 2017, 3:39:48 UTC Completed and validated 54.58 25.10 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930083173 1409311121 5 Jan 2017, 2:22:21 UTC 5 Jan 2017, 3:38:11 UTC Completed and validated 54.56 25.18 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930082322 1409218596 5 Jan 2017, 2:20:45 UTC 5 Jan 2017, 3:36:34 UTC Completed and validated 54.04 24.45 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930082384 1409303133 5 Jan 2017, 2:20:45 UTC 5 Jan 2017, 3:36:34 UTC Completed and validated 54.54 24.84 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930081665 1409314567 5 Jan 2017, 2:19:09 UTC 5 Jan 2017, 3:34:57 UTC Completed and validated 54.56 25.30 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
    1930081529 1409285857 5 Jan 2017, 2:19:09 UTC 5 Jan 2017, 3:34:57 UTC Completed and validated 54.04 24.21 133.66 MilkyWay@Home v1.43 (opencl_ati_101)
     
  25. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    614
    Likes Received:
    89
    I'm thinking something a little older and hopefully cheap like a Asus P9X79-Deluxe, Intel i7 3930k would get this card really rolling.