PrimeGrid Races 2018

lane42 · Sep 17, 2018

Yes,i guess ill leave it at 6 cores 1 thread for now.

zzuupp · Sep 18, 2018

Good timing for me. I was just gonna say that I should drop two more shortly. They did.

I got the other box going as well. Too lazy to check.

crashtech · Sep 18, 2018

These WUs run long! I'm wondering how many threads can be assigned to each task before diminishing returns get too bad.

TennesseeTony · Sep 18, 2018

6 to 8 threads perform about the same...

StefanR5R · Sep 18, 2018

Diminishing returns or not, with the longer running LLR subprojects it is IMO still worthwhile to give a task all threads it can get, after it has been determined how many concurrent tasks the particular processor optimally supports.

Ken g6 · Sep 18, 2018

StefanR5R said:
Diminishing returns or not, with the longer running LLR subprojects it is IMO still worthwhile to give a task all threads it can get, after it has been determined how many concurrent tasks the particular processor optimally supports.

Seems like once one guy found leaving one hyper-thread free helped, but that may only happen on systems that are busy doing other stuff as well.

StefanR5R · Sep 18, 2018

Ken g6 said:
Seems like once one guy found leaving one hyper-thread free helped, but that may only happen on systems that are busy doing other stuff as well.

Do you mean this guy? Can he be trusted?

--------
If there is just light background load, it may fit well into the idle periods which are left when LLR runs with a higher thread count per process. (E.g., I am seeing a processor load of ~690 % with 7 threads, 1000..1040 % with 11 threads...)

StefanR5R said:
all threads it can get,

If there is a more demanding background load, then "all threads it can get" does not include the number of hardware threads which the operator wants to dedicate to that other load. :-)

Though if there is a demanding secondary load on the host, then there arises also the question whether the optimum number of concurrent LLR tasks is now lower than without secondary load. (1st world problem; not a problem on small CPUs which are already at their limit with a single LLR task, of course.)

crashtech · Sep 18, 2018

Well for instance I was wondering if it might not be better on a 6C/12T to do 1x11 instead of 2x6.

StefanR5R · Sep 18, 2018

crashtech said:
Well for instance I was wondering if it might not be better on a 6C/12T to do 1x11 instead of 2x6.

It's unfortunate that I didn't find time to run my usual tests before this challenge.

Looking at my older tests, llrCUL's CPU times (95 h) are between llrPSP (153 h) and llrGCW (67 h)/ llrESP (59 h)/ llr321 (48 h). Longer CPU times mean larger memory footprint of the working set = larger cache demands. (I am referring to PrimeGrid's global average CPU times which are listed at the "Edit PrimeGrid preferences" web page.)

On E5-2690 v4, which has got 35 MB shared inclusive L3 cache, I ran

the larger llrPSP in April 2017 with 2 tasks per socket (alas, didn't try 3 tasks per socket),
the smaller llrGCW in August 2017 with 3.5 tasks per socket,
the even smaller llrESP in April 2018 with 3 tasks per socket. Here I also tested 1, 2, 3.5, 4, and 5 tasks per socket which were all inferior to 3 tasks per socket.

(As an aside, early in 2017 I never used HT; only later in 2017 I learned that HT can be beneficial.)

Since I don't know what's best for llrCUL, I simply run 3 per socket now. Some of the tasks finish with longer run times than others, which makes me suspect that I am off the actual optimum. I do intend to find a time slot to test llrWOO before the next challenge. llrWOO is at 117 h average CPU time, i.e. sits between llrCUL and llrPSP.

Back to the 6C/12T which you are referring to: If it is a Coffee Lake i7, then it has 12 MB inclusive L3 cache, i.e. about 1/3rd of the E5 2690 v4. So if we knew the optimum number of concurrent tasks for the E5, and assume that L3 cache size is the most influential factor for this number between architectures as similar as BDW-EP and CFL, we could make a guess for the optimum number on CFL.

But we do know that the presumably less cache demanding llrESP was best with 3/socket on the E5. This does at least indicate that 1/socket may be better than 2/socket with the probably more cache demanding llrCUL on a processor with 1/3rd the cache.

crashtech · Sep 18, 2018

3 per socket on the E5-2690 v4, is that 9 threads per task then?

Ken g6 · Sep 18, 2018

Day 3 stats:

Rank___Credits____Username
10_____1132413____xii5ku
21_____692974_____crashtech
25_____567262_____Howdy2u2
62_____205264_____Ken_g6
66_____197115_____biodoc
76_____177916_____Orange Kid
84_____165373___10esseeTony
112____100102_____zzuupp
124____95062______Lane42

Rank__Credits____Team
2_____8969168____Aggie The Pew
3_____8963304____SETI.Germany
4_____4806869____Sicituradastra.
5_____3333486____TeAm AnandTech
6_____2411525____Crunching@EVGA
7_____2340295____AMD Users
8_____2262518____BOINC@MIXI

We're 5th!

Can we get to 4th? Maybe...

StefanR5R · Sep 18, 2018

crashtech said:
3 per socket on the E5-2690 v4, is that 9 threads per task then?

Yes, correct.
Also, I run 4 per socket on E5-2696 v4 with 55 MB L3 (with 11 threads per task), and it seems that task run times on them are less variable than on the 2690.

SETI.Germany's pschoefer posts daily stats in SG's forum, and here is his commentary (after my attempt at translation):

Day 1:
With a good start as usual, we were able to keep up with the Czechs for quite a while, but the gap is now slowly widening. Meanwhile, Aggie The Pew are working their way upwards after their usual weak start. Sicituradastra. soon became unable to keep up. Behind, BOINC@MIXI got off the start very well but need to make an effort to keep the AMD and EVGA crunchers at bay. TeAm AnandTech is currently being missed in the leading group. But there is still a long way to go, and as DeleteNull wrote, some computers may not even have delivered their first result yet.

Day 2:
As expected, more and more computers delivered now, such that almost all teams were able to increase their output drastically. At the top, CNT could distance themselves further, even though we had a higher relative increase of output. Aggie The Pew already passed us once today, but lost ground again and even has a larger absolute distance than 24 hours ago. It seems to turn into a cat-and-mouse or rather tiger-and-rat game for second rank. While S* are moving within no man's land, BOINC@MIXI were unable to put on more coal on the fire after their good start, and had to let the AMD Users pass. The EVGA crunchers are slowly working their way towards the Japanese team too, but will themselves have problems to hold off TAAT, who now caught up.

Day 3:
All teams in the top group managed to increase their daily output once again, although with large differences. With a modest increase, the Czech extended their lead. The rats weighed in almost as much in absolute numbers, and thus passed us. Therefore our clear lead from yesterday has turned into a small but during the last hours constant disadvantage. The star crunchers apparently found their cruising speed, but TAAT behind them is now faster on the road. With the current difference in daily output, S* would end up ahead by a nose, but is TAAT already at top speed? Picking up some pull with TAAT, the EVGA crunchers not only overtook BOINC@MIXI, but also the AMD Users.

TennesseeTony · Sep 18, 2018

Sorry, but I need an app config for a 56 thread dual socket Haswell....the ncpu thing still confuses me....I'd prefer to run it at 7 threads per task...

TennesseeTony · Sep 18, 2018

Nevermind, that was as simple as changing everything to 7 in the race thread at PG.com.

zzuupp · Sep 18, 2018

TennesseeTony said:
Nevermind, that was as simple as changing everything to 7 in the race thread at PG.com.

Why not?

crashtech · Sep 18, 2018

I've turned a few up to 11 now!

Ken g6 · Sep 19, 2018

Day 4 stats:

Rank___Credits____Username
7______2151765____xii5ku
13_____1188474____crashtech
27_____796300_____Howdy2u2
50_____398688_____biodoc
69_____302947_____Ken_g6
77_____276355_____Orange Kid
84_____249308___10esseeTony
109____166874_____zzuupp
134____128397_____Lane42

Rank__Credits____Team
2_____13495781___SETI.Germany
3_____12783905___Aggie The Pew
4_____7290671____Sicituradastra.
5_____5659111____TeAm AnandTech
6_____3726231____Crunching@EVGA
7_____3558636____AMD Users
8_____3132813____BOINC@MIXI

Do you ever get the feeling people are just lining up to pass you?

As an individual; as a TeAm we're cruising along.

StefanR5R · Sep 19, 2018

Ken g6 said:
Do you ever get the feeling people are just lining up to pass you?

Hmm...... no, not so much lately.

:-)

StefanR5R · Sep 19, 2018

PS,
the secret is to start >1 day late.

TennesseeTony · Sep 19, 2018

Ken g6 said:
Do you ever get the feeling people are just lining up to pass you?

Breath Weapon needs a twin friend, you can call it Breath Mint.

Re: My output according to Free-DC....why am I always one of the ones who gets left out when there is a glitch in the database?

My last results were two days ago...

StefanR5R · Sep 19, 2018

It's not a glitch. Your latest validation happened on 17 September 14:47 UTC. All tasks which you reported more recently than this are still pending validation.

(PrimeGrid's challenge stats currently contain granted credit after validation + pending credit before validation. But they will wait weeks if not months for the "cleanup" after Saturday for all validations to complete, before the challenge results are declared final.)

PS,
if I am counting correctly, only 19 % of your completed tasks were validated yet. I got already 50 % validated. Though the pace of validations doesn't influence the outcome of the challenge, of course.

Ken g6 · Sep 19, 2018

TennesseeTony said:
Breath Weapon needs a twin friend, you can call it Breath Mint.

To keep my naming convention I think I'd have to name it either Broken Wind or maybe Post Dated Check Loan.

So I don't think I want an exact twin.

Markfw · Sep 19, 2018

Well, I am up to 10 1080TI's, 2 1070TI's and 2 1080's, my electric system can't deal with any more video cards. I already can't turn on my induction cooker and microwave or it blows the kitchen circuit. I have to do one at a time. And if I turn on the dual 1080's, that might change. (not sure which circuit they are on, because I have some computers on the kitchen circuit)

BUT I bet I can remove my 4790k and drop in a 2950x ! Then the box with the least cores and slowest will be my 1800x 16 thread.

crashtech · Sep 19, 2018

@Markfw , I wonder how many more PPD per KWh you could do if you undervolted instead of overclocked? I've mostly laid off the OCing since the electric bills have been almost too much to handle.

Ken g6 · Sep 19, 2018

crashtech said:
@Markfw , I wonder how many more PPD per KWh you could do if you undervolted instead of overclocked? I've mostly laid off the OCing since the electric bills have been almost too much to handle.

To be clear, with Nvidia cards, the thing to do is to turn down the maximum wattage. I haven't seen much benefit from underclocking or undervolting. I think changing the wattage generally does that for you.

PrimeGrid Races 2018

Diamond Member

Lifer

Lifer

Elite Member

Elite Member

Programming Moderator, Elite Member

Elite Member

Lifer

Elite Member

Lifer

Programming Moderator, Elite Member

Elite Member

Elite Member

Elite Member

Lifer

Lifer

Programming Moderator, Elite Member

Elite Member

Elite Member

Elite Member

Elite Member

Programming Moderator, Elite Member

Moderator Emeritus, Elite Member

Lifer

Programming Moderator, Elite Member