About Collatz Conjecture

StefanR5R · Jul 23, 2017

Orange Kid said:
Just to let everyone know. The optimized .config files will reduce the amount of time to process the WU's and you will notice less credit per WU, but overall it will increase the credit per day.
There is also a thread on the Collatz forum on the subject

I agree that it is very worthwhile to search this thread at the Collatz forum, or at other forums like OCN's, for configurations for GPUs that are similar to one's own. Even if you don't end up with the precise optimum set of parameters for your GPU, chances are good that you still get much better results than with the default parameters. Of course if you have the time, you can very easily test out how the parameters affect your particular card.

I'm distilling some information about the various parameters from the Collatz forum now. I'll post some actual results from my own GPUs later.

Configuration file locations (on Windows, replace the first path components on other OSs accordingly):

C:\ProgramData\BOINC\projects\boinc.thesonntags.com_collatz\<app_name>.config

During project initialization on your client, empty <app_name>.config files will be created for each of the application versions that match your GPUs. You can enter parameters into these files in order to deviate from default values, and they will be picked up as soon as a Collatz GPU task starts.

Configuration file format

Plain text file, one "parameter=value" pair per line, unrecognized parameter names are simply ignored (you can use this to comment out some parameters during testing), missing parameters fall back to their default values.

Example (suitable for a GTX 1080):

kernels_per_reduction=48
threads=9
lut_size=17
sieve_size=30
cache_sieve=1

Parameters

cache_sieve

default: 1 (?)
range: 0 or 1 (?)
definition: "any setting other than 1 will add several seconds to the run time as it will re-create the sieve for each WU run rather than re-using it"

kernels_per_reduction

default: 32
range: 1...64
definition: "the number of kernels that will be run before doing a reduction. Too high a number may cause a video driver crash or poor video response. Too low a number will slow down processing. Suggested values are between 8 and 48 depending upon the speed of the GPU."
comment: "affects GPU usage and video lag the most from what I [sosiris] tested."

lut_size

default: 10
range: 2...31
definition: "the size (in power of 2) of the lookup table. Chances are that any value over 20 will cause the GPU driver to crash and processing to hang. The default results in 2^10 or 1024 items. Each item uses 8 bytes. So 10 would result in 2^10 * 8 bytes or 8192 bytes. Larger is better so long as it will fit in the GPUs L1/L2 cache. Once it exceeds the cache size, it will actually take longer to complete a WU since it has to read from slower global memory rather than high speed cached memory."
comment: "I [sosiris] choose 16, 65536 items for the look up table because it would fit into the L2$ (512KB) in GCN devices. IMHO it could be 20 for NV GPUs, just like previous apps, because NV GPUs have better caching."

reduce_cpu

default: 0
range: 0 or 1
definition: "The default is 0 which will do the total steps summation and high steps comparison on the GPU. Setting to 1 will result in more CPU utilization but may make the video more responsive. I have yet to find a reason to do the reduction on the CPU other than for testing the output of new versions."
comment: "I [sosiris] choose to do the reduction on the CPU because AMD OpenCL apps will take up a CPU core no matter what you do (aka 'busy waiting') and because I want better video response."

sieve_size

default: ?
range: 15...32
definition: "controls both the size of the sieve used 2^15 thru 2^32 as well as the items per kernel are they are directly associated with the sieve size. A sieve size of 26 uses approx 1 million items per kernel. Each value higher roughly doubles the amount. Each value lower decreases the amount by about half. Too high a value will crash the video driver."

sleep

default: 1
range: ?
definition: "the number of milliseconds to sleep while waiting for a kernel to complete. A higher value may result in less CPU utilization and improve video response, but it also may lengthen the processing time."

threads

default: 6
range: 6...11
definition: "the 2^N size of the local size (a.k.a. work group size or threads). Too high a value results in more threads but that means more registers being used. If too many registers are used, it will use slower non-register memory. The goal is to use as many as possible, but not so many that processing slows down. AMD GPUs tend to work best with a value of 6 or 7 even though they can support values of up to 10 or 11. nVidia GPUs seem to work as well with higher values as lower values."
comment: "I [sosiris] didn't see lots of difference once items per work-group is more than wavefront size (64) of my HD7850 in the profiler."

verbose

default: 0
range: 0 or 1
definition: "1 will result in more detail in the output."

Definitions are taken from Slicker's post from June 2015, last modified in September 2015.
Comments are taken from sosiris' post from June 2015.
Edit April 28 2018, added definition of cache_sieve from a post from Slicker from April 2018

StefanR5R · Jul 24, 2017

Here are results from a few GPUs without and with custom parameters.

AMD W7000 (similar to Radeon HD7870)
host: i7-4960X@4.2 GHz, Windows 7 Pro

Code:

configuration         default (empty)           kernels_per_reduction=48
                                                threads=8
                                                lut_size=16
                                                sieve_size=29
------------------------------------------------------------------------
throughput            ~1 M PPD                  1.3...1.4 M PPD

NVIDIA GTX 1700
factory-OC'd card, 170 W power target per factory, x 114 % manual power target
host: i7-4960X@4.2 GHz, Windows 7 Pro

Code:

configuration         default (empty)           kernels_per_reduction=48  kernels_per_reduction=48
                                                threads=8                 threads=8
                                                lut_size=17               lut_size=17
                                                sieve_size=28             sieve_size=30
--------------------------------------------------------------------------------------------------
avg core utilization  98 %                      98.5 %
avg task duration     870 s                     500 s
avg task credits      37 k                      31 k
throughput            3.65 M PPD                5.3 M PPD                 5.7 M PPD
avg power draw        ~179 W                    179 W
energy efficiency     0.24 PPJ                  0.34 PPJ                  0.36 PPJ

NVIDIA GTX 1800
180 W board power, x 90 % power target
host: E5-2696v4@2.2 GHz, Windows 7 Pro

Code:

configuration         default (empty)           kernels_per_reduction=48  kernels_per_reduction=56
                                                threads=9                 threads=9
                                                lut_size=17               lut_size=18
                                                sieve_size=30             sieve_size=30
                                                cache_sieve=1             cache_sieve=1
--------------------------------------------------------------------------------------------------
avg core utilization  98 %                      98.5 %                    98 %
avg task duration     ~770 s                    360 s                     517 s
avg task credits      ~36 k                     28 k                      29 k
throughput            ~4.0 M PPD                6.3 M PPD                 4.9 M PPD
avg power draw        154 W                     158 W                     159 W
energy efficiency     0.30 PPJ                  0.46 PPJ                  0.36 PPJ

NVIDIA GTX 1800Ti
250 W board power, x 90 % power target
host: E5-2696v4@2.2 GHz, Windows 7 Pro

Code:

configuration         default (empty)           kernels_per_reduction=48  kernels_per_reduction=56
                                                threads=9                 threads=9
                                                lut_size=17               lut_size=18
                                                sieve_size=30             sieve_size=30
                                                cache_sieve=1             cache_sieve=1
--------------------------------------------------------------------------------------------------
avg core utilization  65.8                      95.4 %                    96 %
avg task duration     ~770 s                    277 s                     275 s
avg task credits      ~36 k                     28 k                      29 k
throughput            ~4.0 M PPD                8.7 M PPD                 9.1 M PPD
avg power draw        197 W                     219 W                     219 W
energy efficiency     0.24 PPJ                  0.46 PPJ                  0.48 PPJ

Conclusion: The default parameters are conservatively chosen, so that small GPUs run the application properly. The larger your GPU, the more performance uplift can be gained by custom parameters, combined with increased energy efficiency. Too large settings can have a detrimental effect though (rightmost column in the GTX 1080 table).

crashtech · Jul 24, 2017

Crunching@EVGA has been putting out some impressive Collatz numbers. I think they may overtake us at some point:

http://stats.free-dc.org/stats.php?page=team&proj=col&team=2443

StefanR5R · Jul 24, 2017

Our 28 days result is almost double of theirs, but their 7 days result is almost double of ours (and is a quarter of their 28 days result). In other words, Crunching@EVGA haven't finished the Collatz sprint yet.

crashtech · Jul 24, 2017

StefanR5R said:
Our 28 days result is almost double of theirs, but their 7 days result is almost double of ours (and is a quarter of their 28 days result). In other words, Crunching@EVGA haven't finished the Collatz sprint yet.

Does it really take that long to get work validated? I've put some GPUs towards Collatz just in case.

Kiska · Jul 24, 2017

crashtech said:
Does it really take that long to get work validated? I've put some GPUs towards Collatz just in case.

Work validation is usually instant for collatz that is

StefanR5R · Jul 24, 2017

@crashtech, I meant to say it appears as if they keep going in full-on sprint mode. Looking closer, they did 220 M points in the sprint (3...4 days), and "only" 144 M points in the last 7 days. IOW not all but a few of their members missed that the Collatz Conjecture sprint is over now.

StefanR5R · Oct 13, 2017

VirtualLarry said:
Looking forward to siccing my i3-8100 on it. Currently burning in with Einstein@H and NumberFields.

Btw, is there something going on with collatz? I tried joining that project on that box (already have account), and it basically silently failed, and said it couldn't. I know that they instituted a password, to prevent spammers. Is that my problem? Haven't crunched it in a while.

It's been a while that I last added a computer to Collatz, but it worked like intended: I started boincmgr on the new computer, and used "Tools"/ "Add project...". Entered the e-mail address and my own password of my existing Collatz account there.

Another possibility to add a new computer to an existing account is to open a command line, and there run

Code:

cd "C:\Program Files\BOINC"
boinccmd --project_attach "https://boinc.thesonntags.com/collatz/" abcd1234abcd1234abcd1234abcd1234

where you replace that hexadecimal string with the account key which you can see after you logged into your account at the project site and clicked "Account keys"/ "View".

VirtualLarry said:
I know that they instituted a password, to prevent spammers. Is that my problem? Haven't crunched it in a while.

No. Their password, or invitation code actually, is only used when you create a new account, not when you add another computer to an already existing account.

TennesseeTony · Nov 5, 2017

Sorry, just bumping, didn't realize it had been bumped up so recently already.

Chaotic42 · Nov 5, 2017

Dang it, Tony, I've got work to do and you're sending me down this rabbit hole of reading about this conjecture...

StefanR5R · Nov 5, 2017

The conjecture appears so much removed from practical applications. Yet Lothar Collatz was professor for Applied Mathematics, and much of his lifework has great practical implications indeed.

crashtech · Jan 24, 2018

I see you coming for me, @TennesseeTony , but I have stepped on the gas!

TennesseeTony · Jan 24, 2018

LOL. An accident, as I forgot I had enabled "Use AMD GPU" in my preferences.

I was baffled when checking that system today, wondering why my Milkyway tasks were just sitting there and my CPU usage was so low.

Actually, I don't know why it didn't pick up Collatz tasks before today.....that preference setting has been in place for quite some time.

crashtech · Jan 24, 2018

Aw, dang. There goes my fun. So, you are doing Collatz on CPU?

TennesseeTony · Jan 25, 2018

Intel integrated GPU on two systems, one slow one (Haswell i7) and one about 3 times faster (Xeon quad core, v5 [Skylake?]) The points are too good with Collatz NOT to spend the extra 15-20 watts (per CPU).

crashtech · Jan 25, 2018

Oh, yeah, that makes sense now. You clued me into that also, although I realized that at this point I have mostly Xeons without iGPU, only 4 systems with Intel graphics now.

StefanR5R · Apr 28, 2018

I updated a bit in post #26 about the config file contents.

After the recent reboot of Collatz Conjecture, a new thread on GPU optimization can be found here: https://boinc.thesonntags.com/collatz/forum_thread.php?id=8

Ken g6 · Apr 16, 2019

Seeing as how the probability of Collatz being a BOINC Pentathlon project is high, it seems like we should all make sure our computers are set up to run it. (Assuming you're willing to ever run it at all.) That means making sure you're registered with the invite code:

TennesseeTony said:
Current Invite code: spammerssuck

And that your .config file is set up for your current GPU(s).

crashtech · Apr 16, 2019

Good idea, I'm going to run a few WUs through every system just to check.

About Collatz Conjecture

Elite Member

Elite Member

Lifer

Elite Member

Lifer

Golden Member

Elite Member

Elite Member

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Programming Moderator, Elite Member

Lifer