MilkyWay@H - Benchmark thread Winter 2016 on (updated 1-2021) - GPU & CPU times wanted for new WUs

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.


Elite Member
Dec 10, 2016
As far as I recall, N-Body's default mode of operation is to run a single task at once, and configure this task to spawn as many program threads as there are hardware threads. And that's indeed what the application is reporting in stderr.txt on your computer: It ran with 12 program threads. Example task:
Using OpenMP 12 max threads on a system with 12 processors

Furthermore, OpenMP is AFAIK a programming interface which is normally used to simply tell the compiler to try and parallelize program sections automatically. That is, an application programmer who uses OpenMP usually(?) relies on the compiler to sort out all the details of thread creation, of thread synchronization, and importantly, of distributing the work onto all of the threads.

Note that it says "12 max threads" in your stderr. This means that the application will make an attempt to do its work in 12 threads, but it may have sections in which it cannot use all of them. In fact, there will definitely be single-threaded portions too. Its been a long time since I ran N-Body myself, and I don't recall how much of the total run-time is spent in such single-threaded portions — hopefully not much.

But even the parts of the workload which the compiler is able to parallelize well, might not scale easily to a given thread count:
– The higher the thread count, the more time and energy is spent for synchronization overhead.
– The workload may not be easily divisible into any arbitrary thread count. Many parallel computation problems are fitting best into thread counts which are a multiple of small factors — factor 2 often being the best. Your thread count of 12 is 2*2*3 which may, or may not, be easy to deal with for N-Body in particular; I don't know.
– If all threads are doing the very same kind of work, e.g. vector arithmetic, Intel HyperThreading or AMD SMT won't improve throughput much, or even be detrimental due to the aforementioned overhead.

I found this app_config.xml on one of the computers which I currently have switched on:
        <cmdline>--nthreads 4</cmdline>
This is on a 22-core/ 44-thread Xeon E5-2696 v4. It configures the application to use 4 threads per task, instead of the whatever many threads it would automatically choose on this CPU. I don't remember how many of such 4-threaded tasks I allowed BOINC to run simultaneously at that time.

You could experiment with 6 threads per task, as well as with 4 threads per task, and see if you get better utilization.¹ After you created (or modified) the projects/milkyway.cs.rpi.edu_milkyway/app_config.xml file, the easiest way to apply the customized settings is to restart the BOINC client.

¹) Better utilization does not automatically mean better throughput. Finding the throughput sweetspot will require either more time-consuming testing with many tasks, including the recording of the credits for their results, or a benchmarking setup with a fixed workunit (like I built for PrimeGrid LLR, LLR2, and Genefer, and would be good to have for more applications…).
Last edited:


Elite Member
Nov 4, 1999
I see, I wondered if something like that was going on, interesting.
Not sure if I'm going to run it for long atm, so I'll hold fire on the config modding, but thanks for the idea :)


Senior member
Feb 11, 2008
MW's Seperation project ended this month, so only the CPU project is running now.

Anyone fancy posting benchmarks for that? (I'll post some too at some point).
@Endgame124 did you ever post benchmarks for the 5950 you had a couple of years ago? As per this post. (I can't see any).
If I did record benchmarks, I didn’t post them and have long since lost them. Maybe I should look into it again


Elite Member
Nov 4, 1999
That would be a shame if you have lost them.

Well my Ryzen 3600 has been crunching MW for a good few days now, I was hoping to be able to post a fairly narrow ranged time vs credit given for certain WUs, but the times are all over the place and don't tally well with the credit given.
Although for the 40-90 ish credit tasks I was getting times of about 2800-3400s, for the 322-463 credit ones I was getting 32,000-39,000s (but one 437 credit one took ~25.5k s, whilst most near that credit were taking about 36k s).