PrimeGrid: CPU benchmarks

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Table of contents:

#1 .......... GCW Sieve v1.00​
#2 .......... SoB-LLR v7.06​
#6 .......... PSP-LLR v8.00​
#9 .......... SGS-LLR v8.00​
#31 ........ GCW-LLR v8.00​
#39 ........ 321-LLR v8.00​
#45 ........ ESP-LLR v8.00​
#53 ........ SR5-LLR v8.01​
#56 ........ PPS-LLR v8.01​
#70 ........ GFN21 v3.20 for CPUs​
#71 ........ PPS-DIV v8.04​
#88 ........ 321-LLR v8.04​
#92 ........ PPS-DIV v9.01​

A method for testing without the influence of WU variability is described in post #44. This method has been updated to LLR2 in post #91.

------------

Here are some GCW Sieve stats from the CPU-only "Generalized Cullen/Woodall (Sieve) 1.00" application. Runtimes were taken from the latest 20 valid tasks per machine.

Code:
             CPU   2x E5-2690v4 2x E5-2690v4   i7-6950X     i7-4960X     E3-1245v3    i7-4900MQ  Phenom II X4 905e
           clock        3.2 GHz      3.2 GHz      4.0 GHz      4.5 GHz      3.6 GHz     3.15 GHz      2.5 GHz
------------------------------------------------------------------------------------------------------------------
 mean runtime/WU       7194 s       7242 s       5602 s       5479 s       6911 s       7123 s      10719 s
  min runtime/WU       6696 s       6772 s       5410 s       5249 s       6494 s       6762 s      10464 s
  max runtime/WU       7957 s       7984 s       5987 s       5835 s       7584 s       7373 s      11069 s
             COV      0.059        0.055        0.026        0.024        0.048        0.022        0.021
------------------------------------------------------------------------------------------------------------------
normalized runtime   22'900       23'200       22'400       24'700       24'900       22'400       26'800
------------------------------------------------------------------------------------------------------------------
concurrent tasks         56           56           18           12            8            8            4
         WUs/day        673          668          278          189          100           97           32
     credits/day    367'000      365'000      152'000      103'344       55'000       53'000       18'000

clock:
actual average processor frequency during the GCW sieve multitask workload​

normalized runtime:
mean runtime/WU (s), multilplied by clock (GHz), i.e. runtime scaled to 1.0 GHz; lower is better​

credits/day:
WUs/day multiplied by the average credits/WU, which is slightly more than 546​

Remarks:
  • My actual production in the ongoing Isaac Newton's Birthday Challenge will be lower than the sum of these machines. Some of them have to take breaks from the race periodically.
  • The two i7-X processors are overclocked, but I checked that there are no invalid returns.
  • The 6950X has got 20 logical cores but only 18 are being employed for PrimeGrid. Two remain for F@H.
  • The E3-1245v3 could be a few percent faster; there is a mixed workload on it.
  • Normalized to 1.0 GHz, the speed per core of the Phenom II is not so far behind the other CPUs. But then again, it runs only 1 task per physical core (no HT), all others run 2 tasks per physical core.
 
Last edited:
  • Like
Reactions: TennesseeTony

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Here are SoB-LLR runtimes with the "Seventeen or Bust 7.06" CPU application. These are unusually long tasks; I'll list the times in days, not hours.

As Ken g6 already hinted in the PrimeGrid Races 2017 thread,
  • AVX units, if present, are given a really good workout.
  • Hyper-threading scaling is negative on the CPUs that I tested for it.
  • SoB-LLR chews memory bandwidth like there is no tomorrow.
I'll list runtimes in four sections:
  1. Haswell laptop with four memory configurations
  2. Light load (2 tasks) on various CPUs
  3. Heavy load (tasks on all cores) versus light load
  4. Heavy load with hyper-threading
Hyper-threading was disabled in the BIOS, except in section 4.

Machines used in these tests:
Deneb
AMD Phenom II X4 905e, 2.5 GHz, 4 cores, no SMT, no AVX
DDR2-800 cl6 (15 ns), dual channel​
HSW
Intel i7-4900MQ (Haswell), clock as listed in the tests, 4 cores
DDR3-1600 cl11 (14 ns), single channel, unless noted otherwise in section 1​
IVB-E
Intel i7-4960X (Ivy Bridge-E), 4.5 GHz, 6 cores
DDR3-2400 cl10 (8 ns), quad channel​
BDW-E
Intel i7-6950X (Broadwell-E), 3.5 GHz, 10 cores
DDR4-3000 cl14 (9 ns), quad channel​
BDW-EP
dual Intel Xeon E5-2690 v4 (Broadwell-EP), 2.9 GHz, 2x 14 cores
DDR4-2400 cl17 (14 ns), 2x quad channel, unregistered ECC​

The listed CPU clocks are the average clocks during the tests. The Haswell PC is a laptop which does not offer any means to influence CPU clock in the BIOS. Its clock was always controlled by the thermal limit and therefore varied from workload to workload.

Operating systems: Linux on Deneb and BDW-EP, Windows 7 otherwise.


1. Haswell laptop with four memory configurations

Workload:
4 simultaneous SoB-LLR tasks (100 % CPU load)​

Details of the RAM configurations:
DDR3-1600, timings 11-11-11-28 (14 ns), command rate 1T, single or dual channel
DDR3-2133, timings 11-11-11-31 (10 ns), command rate 2T, single or dual channel​

Min/Max SoB-LLR runtimes:
HSW@2.9GHz, DDR3-1600, single channel...........13.7 - 13.9 days
HSW@2.7GHz, DDR3-2133, single channel...........10.8 - 10.9 days
HSW@2.4GHz, DDR3-1600, dual channel................8.0 - 8.2 days
HSW@2.4GHz, DDR3-2133, dual channel................7.6 - 7.8 days

Remarks:
  • Obviously the single channel configs starved the CPU so much that its lower utilization, hence lower heat output, allowed for higher clocks.
  • Switching from single channel to dual channel reduced the runtimes by 56 %.
  • Comparing the two dual channel configs, 33 % faster memory resulted in 5 % lower runtimes. I wonder how much a role the command rate regression is playing there.

2. Light load (2 tasks) on various CPUs

Workload:
2 simultaneous SoB-LLR tasks
(on BDW-E: additionally 2 Foldig@Home GPU feeder threads)​

Min/Max SoB-LLR runtimes:
Deneb.............................24.0 - 24.0 days
HSW@3.4GHz.................7.1 - 7.1 days
IVB-E................................5.5 - 5.8 days
BDW-E.............................5.3 - 5.6 days

Remarks:
  • With >3 weeks time for task completion, the Deneb won't make it in the upcoming 2 weeks long "Year of the Fire Rooster" challenge.
  • Despite being equipped with better AVX units, the Haswell is much slower than the Ivy Bridge-E. The latter compensates by a faster clock (4.5 GHz : 3.4 GHz), much higher memory bandwidth (4-channel DDR-3-2400 : 1-channel DDR3-1600), and lower memory latency (8 ns : 14 ns).
  • Yet Ivy Bridge-E is beaten by Broadwell-E despite clock disparity (4.5 GHz vs. 3.5 GHz). The Broadwell-E brings much wider AVX units, and some more memory bandwidth to feed them.
  • As there is a water cooler mounted on my BDW-E, I could certainly try to clock it higher than 3.5 GHz when put into a race. But once more simultaneous tasks are put on the CPU, memory bandwidth will again become more of a factor than core clock.

3. Heavy load (tasks on all cores)

Workload:
HSW: 4 simultaneous SoB-LLR tasks on 4 cores (100 % load)
BDW-E: 8 Sob-LLR tasks + 2 F@H feeders on 10 cores (80 + 20 = 100 % load)
BDW-EP: 26 Sob-LLR tasks on 2x 14 cores (93 % load)​

Min/Max SoB-LLR runtimes:
HSW@2.9GHz...............13.7 - 13.9 days
BDW-E.............................6.2 - 6.2 days
BDW-EP..........................9.8 - 10.5 days

Remarks (compare with section 2, light load):
  • Haswell's runtimes increase by 94 % when workload is doubled on the single channel RAM config. I haven't done the same comparison on dual channel RAM.
  • Broadwell-E's runtimes increase by 14 % when going from 2 to 8 SoB-LLR tasks.
  • The fully loaded BDW-EPs show 64 % longer runtimes than the fully loaded BDW-E.
    Differences:
    Number of SoB-LLR tasks per processor differs by 63 %, core clocks differ by 21 %, RAM speed by 25 %, and RAM latency by 53 %.
    Other differences, which may or may not play a role, are dual-socket vs. single socket, MCC die with two ring buses and two home agents vs. LCC die with one ring bus and one home agent, and Linux vs. Windows.

4. Heavy load with hyper-threading

Workload:
HSW: 8 SoB-LLR tasks on 4 cores/ 8 hardware threads (100 % load)
BDW-E: 20 SoB-LLR tasks on 10 cores/ 20 hardware threads (100 % load)​

Min/Max SoB-LLR runtimes:
HSW@2.8GHz...............28.0 - 30.1 days
BDW-E...........................16.8 - 17.5 days

Remarks (compare with section 3, HT off):
  • Haswell's runtimes increase by 110 %, hence total throughput decreases by 5 %.
  • BDW-E's runtimes increase by 177 %, hence total throughput decreases by 38 %.
--------
Edit, April 18:
corrected BDW-EP memory spec
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Initially I just wanted to get a quick overview how the different PCs fare with SoB-LLR. When I saw the low results of the Xeons compared to 6950X, I began the scaling tests. And I finally convinced myself that I needed to do something about that dismal out-of-the-box RAM in my laptop, despite RAM prices going up currently.
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
PS:
SoB-LLR is credited with about 64,500 points/WU. This works out to:

HSW..............4 concurrent tasks, 13.8 days/task..........19,000 points/day
BDW-E...........8 concurrent tasks, 6.2 days/task............83,000 points/day
BDW-EP.......26 concurrent tasks, 10.1 days/task.......166,000 points/day​

Oops, that's about half of the GCW-Sieve points/day from post #1. And this is supposed to include "50 % long job credit bonus and a 10 % conjecture credit bonus".

The SoB-LLR project was started in 2010. Runtimes on today's CPUs are hardly shorter than those reported in 2010, despite better vector units, surely because of the RAM bottleneck. If the good folks at PrimeGrid calibrated points/task back in 2010 and perhaps even benchmarked with just one task/CPU, then that may explain why PPD are so low still in this day and age.
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
PSP-LLR runtimes, "Prime Sierpinski Problem (LLR) 8.00" application,
Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux

HT on, 56 single-threaded jobs: 3.95...4.2 days per task .......... 13.7 tasks per day
HT off, 28 single-threaded jobs: 1.9...2.0 days per task ............ 14.4 tasks per day

t.b.d.: PPD, and multi-threaded tasks.
 
  • Like
Reactions: TennesseeTony

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
PSP-LLR runtimes, "Prime Sierpinski Problem (LLR) 8.00" application,
continued: multithreading study, and yield in terms of points per day

Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux
HT off
Code:
                      session 1 (4 days)        session 2 (4 days)
----------------------------------------------------------------------
threads/task          2            7            7            14
simultaneous tasks    14           4            4            2
----------------------------------------------------------------------
avg task run time     1d7h         5h27m        6h02m        3h07m
tasks per day         11.0         17.6         15.9         15.4
PPD                   155,000      250,000      245,000      235,000

Edit, April 11:
all prior results discarded, they were based on wrong run times and CPU times from primegrid.com's host task lists
Edit, April 12:
added results from 14-threads/task x 2 tasks
Edit, April 14, 15:
improved accuracy with longer session duration
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
SGS-LLR runtimes, "Sophie Germain (LLR) 8.00" application
(name for app_config.xml: llrTPS)

Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux
Code:
hyperthreading        on           off          off          off
---------------------------------------------------------------------
threads/task          1            1            2            4
simultaneous tasks    56           28           14           7
---------------------------------------------------------------------
load average          60.4         29.7         27.4         26.0
avg task run time     25m41s       12m47s       9m47s        6m06s
---------------------------------------------------------------------
tasks per day         3140         3150         2110         1650
PPD                   125,000      126,000      84,000       66,000

Edit: added results with HT on, and 2-threaded
 
Last edited:

Kiska

Golden Member
Apr 4, 2012
1,013
290
136
So a ~1000 point difference between single thread and dual thread....
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
More precisely it's just 400 points (0.3 %) difference (125,400 : 125,800 PPD) between HT on and HT off.

All SGS-LLR tasks that I had gave 39.91 credits/task.
 

Kiska

Golden Member
Apr 4, 2012
1,013
290
136
I see...
Also the credits is a set thing, so you'll always see 39.91
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
I see...
Also the credits is a set thing, so you'll always see 39.91
Well, technically credits aren't constant, but SGS WUs are all so close in size that they're practically constant.
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
The little bit of performance testing for post #9 had a side effect.
Prime Reporting said:
Congratulations! Our records indicate that a computer registered by you has found a unique prime number. This computer is running BOINC, is attached to the PrimeGrid project, and is assigned to the Sophie Germain Prime Search. What makes this prime unique is that it's large enough to enter the Top 5000 List in Chris Caldwell's The Largest Known Primes Database.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,250
3,845
75
Congrats! Did you sign up for automatic reporting, or do you need to report it yourself?

P.S. Mine's bigger! :p
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Hmm, I'm confused. The home.php page appears to say I found 3 primes...
9LDva0S.png

...but the actual primelist shows just one prime (of which I was actually the double checker, not the initial finder). Was this e-mail which I quoted (after I found it buried in the spam folder) about one out of two as-yet unlisted primes?
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
If you want to be the finder more often than the checker do this:
Set the "Store at least __ days of work" to "0" (I think 0 worked, if not set to lowest possible). Set Manager to report as soon as finished (might not need to do this?).
The idea is to get a WU at the same time as your wingman and send the finished results before he does. A faster computer helps, of course :)
 
  • Like
Reactions: Ken g6

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Today I received a second e-mail. This and the earlier mail read like I was an initial finder, not a double checker. So, @Ken g6, the prime number which you are seeing at primegrid.com and which is smaller than yours was found by Randall Scalise and only double-checked by me. There are apparently two more prime numbers of which I was initial finder. According to the two e-mails, primegrid.com now waits 19 days for me to change the setting about automatic reporting (which is off); and if I don't change it, they will ask the double-checker whether he wants to report, and then either he does or it is reported anonymously. Until that happens, we are apparently not shown the actual number(s).

These presumably three top-5000 finds happened with two dual-socket hosts running SGS-LLR for a little less than two days.
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
You will be remembered for all eternity, or at least a week.
Probably for no longer than 6 weeks.
On June 9 2017 Michael Goetz said:
New SGS primes are entering the T5K list in the 4888th position. This means that when 113 more primes larger than 388342 digits (the size of an SGS prime) are discovered, SGS primes will no longer be eligible for T5K.

During the last year, 929 primes larger than SGS were found. Extrapolating that rate, you get 113*365/929 = 44 days until SGS is pushed off the T5K list.
 

TennesseeTony

Elite Member
Aug 2, 2003
4,209
3,634
136
www.google.com
SGS-LLR runtimes, "Sophie Germain (LLR) 8.00" application
(name for app_config.xml: llrTPS)

Broadwell-EP (Xeon E5-2690v4) @ 2.9 GHz (AVX all-core turbo), dual processor board, Linux
Code:
hyperthreading        on           off          off          off
---------------------------------------------------------------------
threads/task          1            1            2            4
simultaneous tasks    56           28           14           7
---------------------------------------------------------------------
load average          60.4         29.7         27.4         26.0
avg task run time     25m41s       12m47s       9m47s        6m06s
---------------------------------------------------------------------
tasks per day         3140         3150         2110         1650
PPD                   125,000      126,000      84,000       66,000

Edit: added results with HT on, and 2-threaded

From the data above, with hyper threading OFF and running all 28 CPUs instead of just 14, the PPD is 168,000?

Oops, I'm an idiot, that is already reported in the second column. And I haven't even had any all natural potato based liquid muscle relaxer.
 

StefanR5R

Elite Member
Dec 10, 2016
5,512
7,818
136
Today it occurred to me that I should test
  • whether to disable Hyperthreading in the BIOS and use 100 % of the CPUs
  • or to leave Hyperthreading enabled but use only 50 % of the logical CPUs
with GCW-LLR on Linux, like I already tested on Windows. (I will post the Windows results later.)

Unfortunately, the test went thoroughly wrong: I accidentally entered "Use at most 50 % of CPU time" on the host with HT on. (Morning caffeine infusion didn't kick in fast enough.) And additionally I did not receive enough tasks to fully saturate the hosts.

But so far it looks like HT off × 100 % CPUs gives about the same throughput as HT on × 50 % CPUs. This was with "powersave" CPU frequency governor, and less than 60 % use of processor cores, nothing running besides PrimeGrid GCW-LLR.

I will try to repeat the test properly if I can get tasks again.
 
Last edited: