MilkyWay@H - Benchmark thread Winter 2016 on (different WU) - GPU & CPU times wanted

Discussion in 'Distributed Computing' started by Assimilator1, Dec 29, 2016.

  1. StefanR5R

    StefanR5R Member

    Joined:
    Dec 10, 2016
    Messages:
    181
    Likes Received:
    79
    The X79/ socket 2011 platform with Sandy Bridge E is still on PCIe 2.0 though. PCIe 3.0 came to socket 2011 with Ivy Bridge E processors. Would it make a difference? From what I understand, there is at worst a negligible small difference in games, but what about computing?
     
  2. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    The 280x will never see a game, it will be for MW only, just looking at the 6 cores of the 3930 to run multiple gpu tasks with a cpu core for each. May check and see what AMD has to offer as well. Want to keep the cost down.
     
  3. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    1,609
    Likes Received:
    235
    In the old days (sometime last year), Milkyway required almost NO bandwidth, you could use 1x adapters and not lose any performance. Not sure with these new work units though. Something I could test out (eventually).
     
  4. salvorhardin

    salvorhardin Senior member

    Joined:
    Jan 30, 2003
    Messages:
    378
    Likes Received:
    8
    On a stock i7-3770K with hyperthreading disabled and all cpus free:
    Stock 7950 (860/1250) 56.48s with 11.45s cpu time (20 workunits averaged)
    Stock R9 390 (1015/1500) 60.72s with 13.14 cpu time (17 workunits averaged)
     
  5. salvorhardin

    salvorhardin Senior member

    Joined:
    Jan 30, 2003
    Messages:
    378
    Likes Received:
    8
    To ignore specific cards in boinc you'll need to edit/create a cc_config.xml file in ProgramData/Boinc folder. I attached what should work with nvidia cards, with N being the device number. Just exit boinc and edit/create the file and then start boinc. When you go into you event log in boinc you'll see at the top that one of your cards is ignored while the other will remain active. On my main computer my 390 is considered device 0 and my 7950 is device 1.

    Code:
    <cc_config>
      <options>
        <use_all_gpus>1</use_all_gpus>
        <ignore_nvidia_dev>N</ignore_nvidia_dev>
      </options>
    </cc_config>
     
  6. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Thanks for your times guys :), will add them shortly.

    My HD 7970 plays Elite Dangerous Horizons just fine :), although I only have a small screen atm (1680x1050).
     
  7. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
    Seeing as you didn't give me an average, I'm taking a visual average of 54.3s (I ignored the last 4 results, as they seem to be outliers).
    Thanks for your time :)

    Is the GPU & CPU at stock clocks?
     
  8. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    Sorry man, been swamped. Everything stock and I haven't checked the times at all since the first day, so I'll go with what ever you want to rate it at.
     
  9. Assimilator1

    Assimilator1 Elite Member

    Joined:
    Nov 4, 1999
    Messages:
    22,834
    Likes Received:
    19
  10. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,241
    Likes Received:
    152
    Apoogies if this has already been covered. I'd like to submit a result for my 5820K + R9 Fury, but even though I have tried to follow instructions regarding restriction of CPU use, the GPU task persists in displaying ".798 CPUs + 1 AMD/ATI GPU," which seems to indicate that a whole CPU core isn't actually being utilized. It's running other CPU only Milkyway@Home tasks concurrently as well, I don't know if this is a problem or how to stop it while allowing only GPU tasks. I hope I haven't missed anything ridiculously simple. The specific setting which I am using is in Options/Computing Preferences/Computing/Usage Limits: Use at most 83% of the CPUs, Use at most 100% of the CPU time. Further restriction down to 66% (which should leave 2 cores free) also has no effect, the ".798" stays the same.
     
  11. StefanR5R

    StefanR5R Member

    Joined:
    Dec 10, 2016
    Messages:
    181
    Likes Received:
    79
    I think this "xyz CPUs + 1 ABC GPU" label on a work unit is only an estimation of how much CPU utilization that GPU WU is going to take. Actual usage could differ more or less.

    If you set the menu items "Activity"/ "Suspend" (third entry in this menu) and "Activity"/ "Use GPU always" (fourth entry) together, would this stop all CPU WUs but run the GPU WU? (Can't check myself at the moment because I don't have a suitable project selected right now.)
     
  12. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    @crashtech the .798 is going to stay the same unless you change your app_config to something like this.

    <app_config>
    <app>
    <name>milkyway</name>
    <max_concurrent>0</max_concurrent>
    <gpu_versions>
    <gpu_usage>1</gpu_usage>
    <cpu_usage>1</cpu_usage>
    </gpu_versions>
    </app>
    </app_config>

    This will allow 1 CPU & GPU to run as the benchmark asks. This is a working app_config so feel free to use it, and then adjust later as needed to run more than one gpu task with .5/.25/.75 etc in the <gpu usage>.

    With the 6-core 5820, if hyper threading is on, with settings at 100% of the cpu's You should see 11 CPU only tasks and 1 GPU task running, because the config file grabbed a core.
    On my 5820, with 2 GTX 980's I have it set at 50% CPU usage, app_config <gpu usage>.5</gpu usage> each card doing 2 tasks. On my task screen I see 4 GPU tasks running and 2 CPU only tasks running. Leaving 6 hyperthreaded cores free for doing normal things on the system. Hope that helps.
     
  13. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,241
    Likes Received:
    152
    Yes, thank you, that's exactly what I needed! I see from actually reading a bit of the manual (fancy that) that this is to be an xml file placed in the specific project subdirectory that you wish to modify, but I don't see one. There are many xml files in the %\ProgramData\BOINC directory, though. That seems like where I would drop it. Is there a good reference resource for this aside from the manual?
     
  14. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    \ProgramData\Boinc\projects\milkyway is where you want to put it.
     
  15. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    I learned about the app_config files over at the Seti forums, they are pretty active, a long time ago, but did have to ask at the Milkyway forums about what settings were good to start with. Mostly trial and error on my part figuring out how much to squeeze out more Wu from the cards and not crash the system.
     
  16. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    If you don't know how to make the file just open notepad, copy and paste that one of mine in there. Then save as app_config.xml into the milkyway directory in the post above. Restart boinc, should say 1 CPU & 1 GPU now.
     
  17. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,241
    Likes Received:
    152
    @iwajabitw, thanks for helping me sort this out. Now I have a result for the OP that seems consistent, if not very exciting. From the data presented, it seems as if I might be better off with the R9 290 that is laying in my spare parts drawer? Wow.

    [​IMG]

    I get 65.9 seconds out of that.

    5820K @ 4.4GHz, 16GB DDR4@2666Mhz, R9 Fury @ 1050MHz (stock).
     
  18. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    Cool, glad you got it going. I have a R9 280 I recently got and it's in a old vista system with a Core2Duo. I can only squeeze 4 tasks at a time out of it and it blows the 2 980's away on this project. With that 5820 you can probably crunch 3-5 times that at once.
     
  19. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,241
    Likes Received:
    152
    With your last sentence you seem to be saying that I am leaving a lot on the table. I can devote 100% of this machine to M@H most of the time, do you have any suggestions? I could put my spare 290 in a vacant slot or swap it out for the Fury. Would they work in tandem? I don't game nearly as much these days, and what I do play worked on the 290 pretty well.

    OP, please let me know if these digressions are unwanted. I'm not sure if my config questions merit a new thread or not...
     
  20. iwajabitw

    iwajabitw Senior member

    Joined:
    Aug 19, 2014
    Messages:
    428
    Likes Received:
    62
    Yeah I do believe they will both work on the same machine, not in Crossfire/SLI like you would for games since there a different series Fury/290 but if the drivers are the same series, I do believe so. Granted I use Nvidia much more than AMD, so I hope I am telling you right. You could possibly be able to put out 500,000 to a million ppd a day.

    Check out this thread at MW's forums. It will give you a lot of info.

    https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3118&sort_style=6&start=0
     
    StefanR5R and crashtech like this.
  21. StefanR5R

    StefanR5R Member

    Joined:
    Dec 10, 2016
    Messages:
    181
    Likes Received:
    79
    Does the CPU application use AVX(-256/-128), or doesn't it?

    milkyway_1.40_x86_64-pc-linux-gnu running on all threads of 2x E5-2690 v4 (Broadwell-EP, 2x 14C/28T) on Supermicro X10DAX runs at 2.9 GHz, which is the all-core AVX turbo. Yet,

    milkyway_1.42_windows_x86_64.exe running on all threads of i7-6950X (Broadwell-E, 10C/20T) on Asus X99-A runs at the fixed non-AVX clock which I configured in the BIOS, not at the AVX clock for which I specified a negative clock offset. I just checked to be sure: Prime95 reliably pulls this CPU down to the AVX clock.

    Is this a Linux <> Windows thing? Or Xeon <> i7-EE?

    Edit:
    Core temperatures of the Xeons: 53...58 °C, of the i7: 60...69 °C. In contrast, WCG OpenZika caused about 10 degrees higher temperatures on the Xeons and on the i7 but ran the Xeons at their non-AVX all-core turbo.

    Edit 2:
    Another striking difference is the runtime of the tasks:
    E5-2690 v4, HT, 2.9 GHz: 3892 s
    (app: MilkyWay@Home v1.40, host: 715668)
    i7-6950X, HT, 4.0 GHz: 4099 s
    (app MilkyWay@Home v1.42, host: 715667)

    (proper report together with other CPUs for inclusion in the OP to follow later)
     
    #71 StefanR5R, Jan 28, 2017
    Last edited: Jan 28, 2017
  22. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,241
    Likes Received:
    152
    I've brought an R9 290 into production using an old low-end HP workstation. I haven't benchmarked it following the OP's rules, but it's churning out two at a time in about 106 seconds, which would mean 1 is occurring every 53 seconds. If Assimilator1 is interested in the performance of this machine, I will bench it by the rules on request.
     
  23. waffleironhead

    waffleironhead Diamond Member

    Joined:
    Aug 10, 2005
    Messages:
    6,484
    Likes Received:
    36
    Fired up the laptop for a few units.
    M275x at stock: average over five WU= 389.9 seconds.
    EDIT: being fed by a I5 4510u. This CPU throttles down to 1.9ghz from its supposed base of 2.1? due to the shared heatsink being overloaded i think.
     
    TennesseeTony likes this.
  24. StefanR5R

    StefanR5R Member

    Joined:
    Dec 10, 2016
    Messages:
    181
    Likes Received:
    79
    Here are some measurements of uninteresting hardware: CPUs, and of GPUs with weak double precision. Well, at least this documents that gear like this should not be considered for MW@H.

    General notes:
    • All values are averages of 20 validated WUs.
    • All samples were taken in steady state, i.e. some time after ramp-up and some time before ramp-down of the work. Linux boxes stood at the login screen; Windows boxes at an empty idle desktop. No screensaver, no svchost.exe fuzzing around.
    • For GPU measurements, one WU at a time was run. No CPU work was run in parallel with GPU work.
    • For CPU measurements, as many WUs as available hardware threads were run. "HT off" means hyperthreading disabled in the BIOS.
    • As noted before, the Linux CPU application v1.40 appears to be much more optimized than the Windows CPU application v1.42: My Broadwell-EP/ Linux was faster per core than my higher-clocked Broadwell-E/ Windows. Unfortunately I have no dual-boot machine ready for direct testing of both applications on identical hardware.
    • COV = σ / µ

    CPU times on Linux (MilkyWay@Home v1.40)

    Xeon E5-2690 v4 @2.9 GHz (HT off: 14 threads for CPU) .......... 2300s (COV = 0.005)
    Xeon E5-2690 v4 @2.9 GHz (HT on: 28 threads for CPU) .......... 3892s (COV = 0.015)
    Phenom II X4 905e @2.5 GHz (4 threads for CPU) ...................... 5559s (COV = 0.001)
    Core 2 T7600 @2.333 GHz (2 threads for CPU) ........................... 5507s (COV = 0.012)

    Notes:
    • Xeon E5 was used in a 2P motherboard, both CPUs loaded with MW@H simultaneously. So, twice as many threads per box as noted above per CPU.
    • Xeon E5 ran at its all-core AVX turbo clock.
    • The T7600 ran a 32 bit OS and application, the others 64 bit OS and application.

    CPU times on Windows (MilkyWay@Home v1.42)

    i7-6950X @4.0 GHz (HT off: 10 threads for CPU) ........................ 2390s (COV = 0.007)
    i7-6950X @4.0 GHz (HT on: 20 threads for CPU) ........................ 4099s (COV = 0.007)
    i7-4960X @4.5 GHz (HT off: 6 threads for CPU) .......................... 2461s (COV = 0.003)
    i7-4960X @4.5 GHz (HT on: 12 threads for CPU) ........................ 4130s (COV = 0.003)
    i7-4900MQ @3.36 GHz (HT off: 4 threads for CPU) .................... 3164s (COV = 0.005)
    i7-4900MQ @3.20 GHz (HT on: 8 threads for CPU) .................... 5780s (COV = 0.005)

    Notes:
    • i7-4900MQ clock was determined by the laptop BIOS, driving the CPU at its thermal limit.
    • OS: Windows 7 64 bit

    GPU times (MilkyWay@Home v1.43)

    GTX 1080 @2000 MHz (CPU, i7-6950X @4.0 GHz) ............................................ 116s (COV = 0.005)
    GTX 1080 @1607 MHz (CPU, i7-6950X @4.0 GHz) ............................................ 128s (COV = 0.004)
    GTX 1070 @2025 MHz (CPU, i7-6950X @4.0 GHz) ............................................ 137s (COV = 0.003)
    FirePro W7000 @950 MHz (CPU, i7-4960X @4.5 GHz) ..................................... 179s (COV = 0.0001)

    Notes:
    • all GPUs at stock memory clocks
    • FirePro at stock GPU clock
    • GTX 1070 factory-OC'd and with boost clock, GTX 1080 at stock and manually OC'd
    • cards were slightly under-utilized by one WU per GPU:
      • GTX 1080 @2000 MHz: 83 % typical, 72 % average GPU load
      • GTX 1080 @1607 MHz: 92 % typical, 82 % average GPU load
      • GTX 1070 @2025 MHz: 90 % typical, 81 % average GPU load
      • W7000: 99 % typical, 95 % average GPU load
    • OS: Windows 7 64 bit, drivers NVIDIA 373.06 and AMD 1.4.1848
     
    petrusbroder likes this.
  25. waffleironhead

    waffleironhead Diamond Member

    Joined:
    Aug 10, 2005
    Messages:
    6,484
    Likes Received:
    36
    I just put a rx460 in my sons minecraft system. The intel gpu was mostly fine, but had some strange pop in going on. Picked up the fastest amd single slot card I could find. anyway...
    EDIT: Being fed by an I5-4460 @ 3.2 Ghz
    It is a factory OC RX460 at 1244 mhz .
    Average over 5 WU is 240.5 seconds.
    Not surprising considering it is 1/16 DP ratio for only 122Gflops.

    https://www.amazon.com/gp/product/B01K1JVO86/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1
     
    #75 waffleironhead, Feb 1, 2017
    Last edited: Feb 1, 2017