PrimeGrid: CPU benchmarks

Discussion in 'Distributed Computing' started by StefanR5R, Jan 5, 2017.

Tags:
  1. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Here are some GCW Sieve stats from the CPU-only "Generalized Cullen/Woodall (Sieve) 1.00" application. Runtimes were taken from the latest 20 valid tasks per machine.

    Code:
                 CPU   2x E5-2690v4 2x E5-2690v4   i7-6950X     i7-4960X     E3-1245v3    i7-4900MQ  Phenom II X4 905e
               clock        3.2 GHz      3.2 GHz      4.0 GHz      4.5 GHz      3.6 GHz     3.15 GHz      2.5 GHz
    ------------------------------------------------------------------------------------------------------------------
     mean runtime/WU       7194 s       7242 s       5602 s       5479 s       6911 s       7123 s      10719 s
      min runtime/WU       6696 s       6772 s       5410 s       5249 s       6494 s       6762 s      10464 s
      max runtime/WU       7957 s       7984 s       5987 s       5835 s       7584 s       7373 s      11069 s
                 COV      0.059        0.055        0.026        0.024        0.048        0.022        0.021
    ------------------------------------------------------------------------------------------------------------------
    normalized runtime   22'900       23'200       22'400       24'700       24'900       22'400       26'800
    ------------------------------------------------------------------------------------------------------------------
    concurrent tasks         56           56           18           12            8            8            4
             WUs/day        673          668          278          189          100           97           32
         credits/day    367'000      365'000      152'000      103'344       55'000       53'000       18'000
    
    clock:
    actual average processor frequency during the GCW sieve multitask workload​

    normalized runtime:
    mean runtime/WU (s), multilplied by clock (GHz), i.e. runtime scaled to 1.0 GHz; lower is better​

    credits/day:
    WUs/day multiplied by the average credits/WU, which is slightly more than 546​

    Remarks:
    • My actual production in the ongoing Isaac Newton's Birthday Challenge will be lower than the sum of these machines. Some of them have to take breaks from the race periodically.
    • The two i7-X processors are overclocked, but I checked that there are no invalid returns.
    • The 6950X has got 20 logical cores but only 18 are being employed for PrimeGrid. Two remain for F@H.
    • The E3-1245v3 could be a few percent faster; there is a mixed workload on it.
    • Normalized to 1.0 GHz, the speed per core of the Phenom II is not so far behind the other CPUs. But then again, it runs only 1 task per physical core (no HT), all others run 2 tasks per physical core.
     
    TennesseeTony likes this.
  2. Loading...

    Similar Threads - PrimeGrid benchmarks Forum Date
    Win10 Pro 64-bit CU, Kepler 384SP GT630/730 1GB DDR3, BOINC PrimeGrid, "computation error"? Distributed Computing Jun 4, 2017
    AMD FirePro S9150 for PrimeGrid Genefer Distributed Computing Mar 31, 2017
    GTX 1080 vs. 1070 in Folding@Home and PrimeGrid Distributed Computing Jan 2, 2017
    PrimeGrid Races 2017 Distributed Computing Dec 29, 2016
    MilkyWay@H - Benchmark thread Winter 2016 on (different WU) - GPU & CPU times wanted Distributed Computing Dec 29, 2016

  3. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Here are SoB-LLR runtimes with the "Seventeen or Bust 7.06" CPU application. These are unusually long tasks; I'll list the times in days, not hours.

    As Ken g6 already hinted in the PrimeGrid Races 2017 thread,
    • AVX units, if present, are given a really good workout.
    • Hyper-threading scaling is negative on the CPUs that I tested for it.
    • SoB-LLR chews memory bandwidth like there is no tomorrow.
    I'll list runtimes in four sections:
    1. Haswell laptop with four memory configurations
    2. Light load (2 tasks) on various CPUs
    3. Heavy load (tasks on all cores) versus light load
    4. Heavy load with hyper-threading
    Hyper-threading was disabled in the BIOS, except in section 4.

    Machines used in these tests:
    Deneb
    AMD Phenom II X4 905e, 2.5 GHz, 4 cores, no SMT, no AVX
    DDR2-800 cl6 (15 ns), dual channel​
    HSW
    Intel i7-4900MQ (Haswell), clock as listed in the tests, 4 cores
    DDR3-1600 cl11 (14 ns), single channel, unless noted otherwise in section 1​
    IVB-E
    Intel i7-4960X (Ivy Bridge-E), 4.5 GHz, 6 cores
    DDR3-2400 cl10 (8 ns), quad channel​
    BDW-E
    Intel i7-6950X (Broadwell-E), 3.5 GHz, 10 cores
    DDR4-3000 cl14 (9 ns), quad channel​
    BDW-EP
    dual Intel Xeon E5-2690 v4 (Broadwell-EP), 2.9 GHz, 2x 14 cores
    DDR4-2400 cl17 (14 ns), 2x quad channel, unregistered ECC​

    The listed CPU clocks are the average clocks during the tests. The Haswell PC is a laptop which does not offer any means to influence CPU clock in the BIOS. Its clock was always controlled by the thermal limit and therefore varied from workload to workload.

    Operating systems: Linux on Deneb and BDW-EP, Windows 7 otherwise.


    1. Haswell laptop with four memory configurations

    Workload:
    4 simultaneous SoB-LLR tasks (100 % CPU load)​

    Details of the RAM configurations:
    DDR3-1600, timings 11-11-11-28 (14 ns), command rate 1T, single or dual channel
    DDR3-2133, timings 11-11-11-31 (10 ns), command rate 2T, single or dual channel​

    Min/Max SoB-LLR runtimes:
    HSW@2.9GHz, DDR3-1600, single channel...........13.7 - 13.9 days
    HSW@2.7GHz, DDR3-2133, single channel...........10.8 - 10.9 days
    HSW@2.4GHz, DDR3-1600, dual channel................8.0 - 8.2 days
    HSW@2.4GHz, DDR3-2133, dual channel................7.6 - 7.8 days

    Remarks:
    • Obviously the single channel configs starved the CPU so much that its lower utilization, hence lower heat output, allowed for higher clocks.
    • Switching from single channel to dual channel reduced the runtimes by 56 %.
    • Comparing the two dual channel configs, 33 % faster memory resulted in 5 % lower runtimes. I wonder how much a role the command rate regression is playing there.

    2. Light load (2 tasks) on various CPUs

    Workload:
    2 simultaneous SoB-LLR tasks
    (on BDW-E: additionally 2 Foldig@Home GPU feeder threads)​

    Min/Max SoB-LLR runtimes:
    Deneb.............................24.0 - 24.0 days
    HSW@3.4GHz.................7.1 - 7.1 days
    IVB-E................................5.5 - 5.8 days
    BDW-E.............................5.3 - 5.6 days

    Remarks:
    • With >3 weeks time for task completion, the Deneb won't make it in the upcoming 2 weeks long "Year of the Fire Rooster" challenge.
    • Despite being equipped with better AVX units, the Haswell is much slower than the Ivy Bridge-E. The latter compensates by a faster clock (4.5 GHz : 3.4 GHz), much higher memory bandwidth (4-channel DDR-3-2400 : 1-channel DDR3-1600), and lower memory latency (8 ns : 14 ns).
    • Yet Ivy Bridge-E is beaten by Broadwell-E despite clock disparity (4.5 GHz vs. 3.5 GHz). The Broadwell-E brings much wider AVX units, and some more memory bandwidth to feed them.
    • As there is a water cooler mounted on my BDW-E, I could certainly try to clock it higher than 3.5 GHz when put into a race. But once more simultaneous tasks are put on the CPU, memory bandwidth will again become more of a factor than core clock.

    3. Heavy load (tasks on all cores)

    Workload:
    HSW: 4 simultaneous SoB-LLR tasks on 4 cores (100 % load)
    BDW-E: 8 Sob-LLR tasks + 2 F@H feeders on 10 cores (80 + 20 = 100 % load)
    BDW-EP: 26 Sob-LLR tasks on 2x 14 cores (93 % load)​

    Min/Max SoB-LLR runtimes:
    HSW@2.9GHz...............13.7 - 13.9 days
    BDW-E.............................6.2 - 6.2 days
    BDW-EP..........................9.8 - 10.5 days

    Remarks (compare with section 2, light load):
    • Haswell's runtimes increase by 94 % when workload is doubled on the single channel RAM config. I haven't done the same comparison on dual channel RAM.
    • Broadwell-E's runtimes increase by 14 % when going from 2 to 8 SoB-LLR tasks.
    • The fully loaded BDW-EPs show 64 % longer runtimes than the fully loaded BDW-E.
      Differences:
      Number of SoB-LLR tasks per processor differs by 63 %, core clocks differ by 21 %, RAM speed by 25 %, and RAM latency by 53 %.
      Other differences, which may or may not play a role, are dual-socket vs. single socket, MCC die with two ring buses and two home agents vs. LCC die with one ring bus and one home agent, and Linux vs. Windows.

    4. Heavy load with hyper-threading

    Workload:
    HSW: 8 SoB-LLR tasks on 4 cores/ 8 hardware threads (100 % load)
    BDW-E: 20 SoB-LLR tasks on 10 cores/ 20 hardware threads (100 % load)​

    Min/Max SoB-LLR runtimes:
    HSW@2.8GHz...............28.0 - 30.1 days
    BDW-E...........................16.8 - 17.5 days

    Remarks (compare with section 3, HT off):
    • Haswell's runtimes increase by 110 %, hence total throughput decreases by 5 %.
    • BDW-E's runtimes increase by 177 %, hence total throughput decreases by 38 %.
    --------
    Edit, April 18:
    corrected BDW-EP memory spec
     
    #2 StefanR5R, Feb 26, 2017
    Last edited: Apr 18, 2017
  4. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    2,008
    Likes Received:
    399
    Wow! What a treasure trove of information! Well done!

    I sure hope this thread doesn't get lost, that's a lot of work and good info.
     
  5. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Initially I just wanted to get a quick overview how the different PCs fare with SoB-LLR. When I saw the low results of the Xeons compared to 6950X, I began the scaling tests. And I finally convinced myself that I needed to do something about that dismal out-of-the-box RAM in my laptop, despite RAM prices going up currently.
     
  6. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    PS:
    SoB-LLR is credited with about 64,500 points/WU. This works out to:

    HSW..............4 concurrent tasks, 13.8 days/task..........19,000 points/day
    BDW-E...........8 concurrent tasks, 6.2 days/task............83,000 points/day
    BDW-EP.......26 concurrent tasks, 10.1 days/task.......166,000 points/day​

    Oops, that's about half of the GCW-Sieve points/day from post #1. And this is supposed to include "50 % long job credit bonus and a 10 % conjecture credit bonus".

    The SoB-LLR project was started in 2010. Runtimes on today's CPUs are hardly shorter than those reported in 2010, despite better vector units, surely because of the RAM bottleneck. If the good folks at PrimeGrid calibrated points/task back in 2010 and perhaps even benchmarked with just one task/CPU, then that may explain why PPD are so low still in this day and age.
     
  7. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    PSP-LLR runtimes, "Prime Sierpinski Problem (LLR) 8.00" application,
    Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux

    HT on, 56 single-threaded jobs: 3.95...4.2 days per task .......... 13.7 tasks per day
    HT off, 28 single-threaded jobs: 1.9...2.0 days per task ............ 14.4 tasks per day

    t.b.d.: PPD, and multi-threaded tasks.
     
    TennesseeTony likes this.
  8. Orange Kid

    Orange Kid Elite Member

    Joined:
    Oct 9, 1999
    Messages:
    3,156
    Likes Received:
    193
    Added a link to this in the Project List under PrimeGrid.
     
    Smoke likes this.
  9. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    PSP-LLR runtimes, "Prime Sierpinski Problem (LLR) 8.00" application,
    continued: multithreading study, and yield in terms of points per day

    Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux
    HT off
    Code:
                          session 1 (4 days)        session 2 (4 days)
    ----------------------------------------------------------------------
    threads/task          2            7            7            14
    simultaneous tasks    14           4            4            2
    ----------------------------------------------------------------------
    avg task run time     1d7h         5h27m        6h02m        3h07m
    tasks per day         11.0         17.6         15.9         15.4
    PPD                   155,000      250,000      245,000      235,000
    
    Edit, April 11:
    all prior results discarded, they were based on wrong run times and CPU times from primegrid.com's host task lists
    Edit, April 12:
    added results from 14-threads/task x 2 tasks
    Edit, April 14, 15:
    improved accuracy with longer session duration
     
    #8 StefanR5R, Apr 6, 2017
    Last edited: Apr 15, 2017
  10. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    SGS-LLR runtimes, "Sophie Germain (LLR) 8.00" application
    (name for app_config.xml: llrTPS)

    Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux
    Code:
    hyperthreading        on           off          off          off
    ---------------------------------------------------------------------
    threads/task          1            1            2            4
    simultaneous tasks    56           28           14           7
    ---------------------------------------------------------------------
    load average          60.4         29.7         27.4         26.0
    avg task run time     25m41s       12m47s       9m47s        6m06s
    ---------------------------------------------------------------------
    tasks per day         3140         3150         2110         1650
    PPD                   125,000      126,000      84,000       66,000
    
    Edit: added results with HT on, and 2-threaded
     
    #9 StefanR5R, Jun 5, 2017
    Last edited: Jun 5, 2017
  11. Kiska

    Kiska Senior member

    Joined:
    Apr 4, 2012
    Messages:
    413
    Likes Received:
    48
    So a ~1000 point difference between single thread and dual thread....
     
  12. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    More precisely it's just 400 points (0.3 %) difference (125,400 : 125,800 PPD) between HT on and HT off.

    All SGS-LLR tasks that I had gave 39.91 credits/task.
     
  13. Kiska

    Kiska Senior member

    Joined:
    Apr 4, 2012
    Messages:
    413
    Likes Received:
    48
    I see...
    Also the credits is a set thing, so you'll always see 39.91
     
  14. Ken g6

    Ken g6 Programming Moderator, Elite Member
    Moderator

    Joined:
    Dec 11, 1999
    Messages:
    12,723
    Likes Received:
    334
    Well, technically credits aren't constant, but SGS WUs are all so close in size that they're practically constant.
     
  15. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    The little bit of performance testing for post #9 had a side effect.
     
    Smoke and TennesseeTony like this.
  16. Kiska

    Kiska Senior member

    Joined:
    Apr 4, 2012
    Messages:
    413
    Likes Received:
    48
    Wait what?!
     
  17. Ken g6

    Ken g6 Programming Moderator, Elite Member
    Moderator

    Joined:
    Dec 11, 1999
    Messages:
    12,723
    Likes Received:
    334
    Congrats! Did you sign up for automatic reporting, or do you need to report it yourself?

    P.S. Mine's bigger! :p
     
  18. Kiska

    Kiska Senior member

    Joined:
    Apr 4, 2012
    Messages:
    413
    Likes Received:
    48
  19. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Hmm, I'm confused. The home.php page appears to say I found 3 primes...
    [​IMG]
    ...but the actual primelist shows just one prime (of which I was actually the double checker, not the initial finder). Was this e-mail which I quoted (after I found it buried in the spam folder) about one out of two as-yet unlisted primes?
     
  20. GLeeM

    GLeeM Elite Member

    Joined:
    Apr 2, 2004
    Messages:
    7,029
    Likes Received:
    48
    If you want to be the finder more often than the checker do this:
    Set the "Store at least __ days of work" to "0" (I think 0 worked, if not set to lowest possible). Set Manager to report as soon as finished (might not need to do this?).
    The idea is to get a WU at the same time as your wingman and send the finished results before he does. A faster computer helps, of course :)
     
    Ken g6 likes this.
  21. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Today I received a second e-mail. This and the earlier mail read like I was an initial finder, not a double checker. So, @Ken g6, the prime number which you are seeing at primegrid.com and which is smaller than yours was found by Randall Scalise and only double-checked by me. There are apparently two more prime numbers of which I was initial finder. According to the two e-mails, primegrid.com now waits 19 days for me to change the setting about automatic reporting (which is off); and if I don't change it, they will ask the double-checker whether he wants to report, and then either he does or it is reported anonymously. Until that happens, we are apparently not shown the actual number(s).

    These presumably three top-5000 finds happened with two dual-socket hosts running SGS-LLR for a little less than two days.
     
  22. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    2,008
    Likes Received:
    399
    So....change your setting, silly goose. ;) Cool stuff. You will be remembered for all eternity, or at least a week.
     
    Ken g6 likes this.
  23. StefanR5R

    StefanR5R Senior member

    Joined:
    Dec 10, 2016
    Messages:
    588
    Likes Received:
    229
    Probably for no longer than 6 weeks.
     
  24. Kiska

    Kiska Senior member

    Joined:
    Apr 4, 2012
    Messages:
    413
    Likes Received:
    48
    So that is the reason why we are having the race so soon :p
     
  25. TennesseeTony

    TennesseeTony Elite Member

    Joined:
    Aug 2, 2003
    Messages:
    2,008
    Likes Received:
    399
    From the data above, with hyper threading OFF and running all 28 CPUs instead of just 14, the PPD is 168,000?

    Oops, I'm an idiot, that is already reported in the second column. And I haven't even had any all natural potato based liquid muscle relaxer.