• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

A personal paradigm shift... Or from "DC on a gaming rig" to "gaming on a DC rig"

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Starting with last year's folding race against Tom's Hardware, my old gaming rig went from just sitting there to Folding@home, and then shortly afterwards also to Rosetta@home (plus recently World Community Grid).

My thoughts about finally upgrading it date way back, even before I first discussed this with @Assimilator1 ;) However, nothing ever materialized, and then the market went entirely crazy, effectively stopping me from even thinking about this, especially from a GPU perspective.

But...

Then I stumbled over a Supermicro DP Xeon motherboard (always loved those...). Its current prices were good. Maybe even very good, given the current market situation. A quick price check on the parts I'd need to get a system like that running didn't look too bad. But what made me finally pull the trigger one week later was finding a store offering the RAM for actually half the regular price and claiming to have it readily available. Unlikely as it seemed, I got all parts within a single week, even those with unknown availability in the respective store.

I spent the better part of the last week to move this system into production (hence the lower than usual output). There were a few hiccups with the Windows scheduler (obviously too many cores...) and FahCore_22 or the 461.92 nVidia driver crashed over night after running for more than a day (I'm back on the 456.71 for the moment). Currently, the working configuration is, after struggling with Process Lasso and the Windows scheduler: ProBalance off, Rosetta processes with fixed CPU affinity (entire CPU 1), WCG processes with fixed CPU affinity (entire CPU 2, excl. two threads), F@H Core 22 processes with fixed CPU affinity (remaining two threads of CPU 2). For the two BOINC projects, their preferences are (# % of processors)=(# dedicated threads)/(total # of threads)*100, so nothing fancy there.

Edit: The system is now on Debian (for DC), on a separate SSD (Thanks @Endgame124 & @Icecold for suggesting it and @StefanR5R for a lot of helpful information).

Brief system overview:

CPU: 2x Intel Xeon E5-2698 v4
Motherboard: Supermicro X10DRi (BIOS modded to add NVMe boot capabilities)
RAM: 8x 16GB Samsung DDR4 2400 RDIMM (ECC)
GPU: Evga GeForce GTX Titan X
Sound: Creative Labs Sound Blaster ZxR
Storage: Supermicro AOC-SLG3-2M2 with 1TB Samsung 970 Evo Plus (Windows) & 250GB Samsung 970 Evo Plus (Debian), 1TB Samsung 940 Evo
PSU: Seasonic Prime TX 1000W

The cooling system was "inherited" from the old gaming rig. The CPUs needed new coolers, obviously, but that's it. Currently, all cores fully loaded, at max. turbo, GPU folding, both CPU report 48-49°C, and the GPU reports 42°C, room temperature is roughly 22°C. Not too bad.

DSC01014.JPG

Overall, I'm glad I did this, as I really grew fond of both, DC and the TeAm, and this way, I can contribute a little more :)
 
Last edited:

Endgame124

Senior member
Feb 11, 2008
755
477
136
Looks really good! Now that you’re gaming on the DC rig, ever consider dual booting between Linux (DC) and windows (Gaming)?
 
  • Like
Reactions: voodoo5_6k

Icecold

Senior member
Nov 15, 2004
938
792
146
Man, that's a sweet setup! That blows away any Xeons I have both in core count and amount of RAM. Looks super clean with the water cooling and everything too. I can only imagine how task manager looks with 80 CPU threads(if set to show each logical processor rather than an overall total utilization of course, which I always set) :D I was wondering the same thing @Endgame124 posted, that it might be worth looking at Linux unless you have a strong preference for Windows.
 
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Thank you both :)

Oh yes, I have thought about Linux. I want to get this running properly on Windows because it would make life easier (i.e. just pause folding and a fraction of BOINC to free up GPU and some CPU, and then play a game). If this should not work in the long run, then Linux is definitely an option. My DNS resolver and my Syslog server already run on Debian. All good things come in threes, I guess ;)

PS: Task Manager...

task manager.png
 
  • Like
Reactions: Icecold

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
There were a few hiccups with the Windows scheduler (obviously too many cores...)
It is not just about the number of cores; I heard that it works poorly on NUMA systems, which includes dual socket systems. (I have dual socket systems myself, but don't have Windows on them.) But you already implemented a workaround which I imagine is effective.

It is difficult to tell from the image: You have an external radiator, and the three front fans are pulling in cool air, not pre-heated air, right? These server boards are designed with the assumption that there is plentiful air flow over VRMs and RAM. Edit, OK, now I read your signature; it answers my question. :-)
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
It is not just about the number of cores; I heard that it works poorly on NUMA systems, which includes dual socket systems. (I have dual socket systems myself, but don't have Windows on them.) But you already implemented a workaround which I imagine is effective.
Yeah, the NUMA node issue is what I observed too. Initially, I wanted Rosetta to have 70 threads. No dice... There is basically a 64 thread limit, up to which the scheduler works OK, but when NUMA nodes come into play too, then you really have a hard time. That's why I settled with the config outlined above. For the time being. If it doesn't work out, then I'll move to dual-boot with Debian, I guess.

It is difficult to tell from the image: You have an external radiator, and the three front fans are pulling in cool air, not pre-heated air, right? These server boards are designed with the assumption that there is plentiful air flow over VRMs and RAM. Edit, OK, now I read your signature; it answers my question. :-)
Yeah, that's right. Three front fans are cold air intake, 2 fans on the back are exhaust. VRM's report ca. 70°C to 78°C. The radiator is external (4x 200mm fans), as are the pumps plus reservoir. That makes the cases free for good airflow. Here's an outside view.

DSC00946.JPG DSC00947.JPG DSC00942.JPG
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Thanks! :)

The RAM... I couldn't believe the price. I must have checked the item number 20 times against the manufacturer's data sheets to make sure I'm actually buying what they claim to sell :D
 

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
Initially, I wanted Rosetta to have 70 threads. No dice...
Does Process Lasso support process hierarchies? If so, then you could have two boinc client instances for Rosetta@home running in parallel, one and its child processes tied to CPU1, the other and its children to CPU2. In addition, one or two boinc client instances for WCG, and the single F@H client instance.

Both Rosetta@home and WCG Microbiome Immunity Project are said to be using relatively much processor cache and maybe RAM I/O, and what I observed when running them coincides with this assumption. (Maybe WCG Africa Rainfall Project too, I don't know.) It would therefore be beneficial to overall throughput if these jobs were spread over both CPUs, and combined with applications with lower cache+IO demand.

But matters are complicated by Folding@home running in addition. This one wants its own great deal of I/O (and perhaps processor cache) as well. F@h on Windows is even more I/O heavy than F@h on Linux, due to some convenient features in Windows' graphics driver stack (driver updates on a live system; driver crash dumps for post mortem analysis) whose cost are more frequent memory copies.

The RAM... I couldn't believe the price. I must have checked the item number 20 times against the manufacturer's data sheets to make sure I'm actually buying what they claim to sell :D
Fortunately your taskmanager screenshot agrees that you received 8x16 GB, not 8x16 MB. :-)
 

Endgame124

Senior member
Feb 11, 2008
755
477
136
Yeah, the NUMA node issue is what I observed too. Initially, I wanted Rosetta to have 70 threads. No dice... There is basically a 64 thread limit, up to which the scheduler works OK, but when NUMA nodes come into play too, then you really have a hard time. That's why I settled with the config outlined above. For the time being. If it doesn't work out, then I'll move to dual-boot with Debian, I guess.


Yeah, that's right. Three front fans are cold air intake, 2 fans on the back are exhaust. VRM's report ca. 70°C to 78°C. The radiator is external (4x 200mm fans), as are the pumps plus reservoir. That makes the cases free for good airflow. Here's an outside view.

View attachment 42096 View attachment 42098 View attachment 42097
Nice looking double pump unit. Is that a heatkiller unit too?

You have a lot of Rad there too, which is really nice! I have a GTX Crossflow 420 and an old Thermochill 360 in my current setup. I suspect once I buy a video card and get it under water, its going to get a little loud. Loud, of course, is subjective since compared to the old PC I replaced its dramatically quieter (currently below ambient noise level), but if I add 350watts of 3090 to the system the fans will probably need to spin up above 14%.
 
  • Like
Reactions: voodoo5_6k

Assimilator1

Elite Member
Nov 4, 1999
23,879
360
126
Sweet upgrade Voodoo5 :cool:
That's a lotta threads on that CPU! :openmouth::D
How much power does that rig pull from the wall?
And great that you got a deal on it too!
 
  • Like
Reactions: voodoo5_6k

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
I have a system with Mo-Ra3 too, but it is the slightly smaller 360 version. It is a single-processor, triple-GPU system. (Default TDP is 250 W per GPU.) Some observations with that system:
  • The four waterblocks, two quick release connectors, large external radiator, plus a small internal radiator surely make for a lot of hydraulic resistance. And yet, a single D5 pump still manages. The pump needs to run at full bore when all GPUs are in use, but can very well be turned down when just the CPU and maybe one GPU is in use.
  • The additional internal radiator in this system, a slim 140 mm x 280 mm radiator, is probably superfluous. I haven't tried without it yet though.
  • Some time back during a contest, I think a SETI Wow! Event which always was in August, the power chord to the fans on the Mo-Ra got accidentally disconnected for a while, until I noticed. During this outage of fans, the system continued to run without errors. The GPUs went hotter but still did not reach their thermal throttling point.
    (The radiator was standing rather freely in the room; it wasn't backed up to a wall.)
  • The system has got the mentioned single D5 pump with PWM control normally dialed to 100 %, the many but slow spinning radiator fans, and several case fans and small internal fans which ventilate RAM, VRMs, and other areas of the mainboard. And of course the GPUs with coils which are whining in some DC projects. Yet all of these together are much quieter than . . . . the fan of the PSU! This one is getting inconveniently loud when all three GPUs are in use (already at 3x 180 W cTDP, more so at 3x 250 W TDP).
While one of my goals with building a Mo-Ra based system was to have a quiet system, this failed completely due to the PSU noise, unfortunately.

Another system of mine, with dual GPUs, thick 120 x 360 radiator + medium 140 x 420 radiator, and again with a D5 pump, is tolerably quiet. Though still not as quiet that I would keep it in my bed room.
 
  • Like
Reactions: voodoo5_6k

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,757
9,994
136
I had a 32 core Threadripper with a 360x25 rad and a 420x38 (I think thats it, thick) rad with a solid copper heatkiller water block only on the CPU. It was too big a case and something got noisey (but I have no idea being basically deaf) so I took it out of service. The OC went from 3.8 to the stock 3.0, but its is WAY more efficient, and quieter with a single D15 HSF. I tried to sell all the parts, and did not even get one reply, so it sits in my shed, water system still sealed.
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Does Process Lasso support process hierarchies? If so, then you could have two boinc client instances for Rosetta@home running in parallel, one and its child processes tied to CPU1, the other and its children to CPU2. In addition, one or two boinc client instances for WCG, and the single F@H client instance.

Both Rosetta@home and WCG Microbiome Immunity Project are said to be using relatively much processor cache and maybe RAM I/O, and what I observed when running them coincides with this assumption. (Maybe WCG Africa Rainfall Project too, I don't know.) It would therefore be beneficial to overall throughput if these jobs were spread over both CPUs, and combined with applications with lower cache+IO demand.

But matters are complicated by Folding@home running in addition. This one wants its own great deal of I/O (and perhaps processor cache) as well. F@h on Windows is even more I/O heavy than F@h on Linux, due to some convenient features in Windows' graphics driver stack (driver updates on a live system; driver crash dumps for post mortem analysis) whose cost are more frequent memory copies.
Thanks for the information & ideas! Looking at how the system runs and what it takes to achieve this in Windows, I have decided to at least try out Debian. It would have been nice to have the cake, and eat it too, but it doesn't seem possible here. I'm not achieving the throughput I'd like to see, and with additional applications to "help" the Windows scheduler, it gets more fragile and complicated. As Rosetta has run out of tasks again, I do have additional time at hand to get Debian installed and F@H plus two BOINC clients running...

Edit: Tasks are available again...

Nice looking double pump unit. Is that a heatkiller unit too?
Yes, you're right, it is this one: MO-RA3 420 D5-DUALTOP Modul, 143,57 € (watercool.de) (But with a custom mounting bracket). They have two versions of it in the "Industrial" section.

You have a lot of Rad there too, which is really nice! I have a GTX Crossflow 420 and an old Thermochill 360 in my current setup. I suspect once I buy a video card and get it under water, its going to get a little loud. Loud, of course, is subjective since compared to the old PC I replaced its dramatically quieter (currently below ambient noise level), but if I add 350watts of 3090 to the system the fans will probably need to spin up above 14%.
Oh yeah, the 3090 will definitely add a lot of heat into any loop. The coolant temperature will increase significantly. Then comes the usual dilemma. Keep the noise level and the temperatures, or increase the noise (fan speed) and decrease the temperatures :D ...or add even more radiator area to mitigate the need for fan speed a little ;)

Sweet upgrade Voodoo5 :cool:
That's a lotta threads on that CPU! :openmouth::D
How much power does that rig pull from the wall?
And great that you got a deal on it too!
Thanks! :) I'm not sure about the power draw yet. I'll see to that once I have the system running at full capacity. I'd assume however, that it won't draw a lot more than the old system. More RAM, one additional CPU, DP Motherboard. But, less gaming etc. stuff on the motherboard, no out-of-the-box overclocking, overvolting etc., CPUs rated roughly the same in TDP and running at stock settings. The UP Xeon was overclocked (all cores at 4GHz) and would therefore draw a lot more than its 130W TDP suggests.

I have a system with Mo-Ra3 too, but it is the slightly smaller 360 version. It is a single-processor, triple-GPU system. (Default TDP is 250 W per GPU.) Some observations with that system:
  • The four waterblocks, two quick release connectors, large external radiator, plus a small internal radiator surely make for a lot of hydraulic resistance. And yet, a single D5 pump still manages. The pump needs to run at full bore when all GPUs are in use, but can very well be turned down when just the CPU and maybe one GPU is in use.
  • The additional internal radiator in this system, a slim 140 mm x 280 mm radiator, is probably superfluous. I haven't tried without it yet though.
  • Some time back during a contest, I think a SETI Wow! Event which always was in August, the power chord to the fans on the Mo-Ra got accidentally disconnected for a while, until I noticed. During this outage of fans, the system continued to run without errors. The GPUs went hotter but still did not reach their thermal throttling point.
    (The radiator was standing rather freely in the room; it wasn't backed up to a wall.)
  • The system has got the mentioned single D5 pump with PWM control normally dialed to 100 %, the many but slow spinning radiator fans, and several case fans and small internal fans which ventilate RAM, VRMs, and other areas of the mainboard. And of course the GPUs with coils which are whining in some DC projects. Yet all of these together are much quieter than . . . . the fan of the PSU! This one is getting inconveniently loud when all three GPUs are in use (already at 3x 180 W cTDP, more so at 3x 250 W TDP).
While one of my goals with building a Mo-Ra based system was to have a quiet system, this failed completely due to the PSU noise, unfortunately.

Another system of mine, with dual GPUs, thick 120 x 360 radiator + medium 140 x 420 radiator, and again with a D5 pump, is tolerably quiet. Though still not as quiet that I would keep it in my bed room.
Interesting observations! Regarding the hydraulic pressure. I have changed the loop from having 1x CPU & 1x VRM waterblock to 2x CPU waterblocks. The rest of the loop is basically unchanged, and running both D5 pumps at full speed I see the following flow: ~265L/h (old) vs. ~245L/h (new). For the old system in regular operation, I had the pumps fixed at "4" which is ~4000rpm for the Watercool D5. That gave me ~210L/h. I'll have to check, how much rpm I'll need for the new system to reach the same flow at load. However, it doesn't really matter, as both pumps at full speed generate less noise than the fans running below 500rpm (what is as low as I let them run). Overall, the system is pretty silent. Both PSUs had good fans. The old unit (Be Quiet! Dark Power Pro (P10) 750W) ran for almost 10 years, and was still inaudible under load. The new one (Seasonic Prime TX 1000W) runs also very silently under load (I have the active cooling setting enabled). I was therefore pretty lucky with my PSUs. Interestingly, the coil whine, which would accompany the short F@H WUs (up to 1.5h overall or <1min. TPF on my GPU) and was loud and annoying as hell, seems to have vanished now. I have one of these WUs running now, and even if I pause it, there is no difference in system noise (it is still the same GPU of course).
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
@voodoo5_6k, I don't know what volume flow my loops have... Incidentally, my dual GPU computers have Be Quiet! Dark Power Pro P11 750W power supplies which are inaudible under full load, and the triple GPU computer has a Be Quiet! Dark Power Pro P11 1200W power supply which is, as mentioned, loud under full load.
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
@voodoo5_6k, I don't know what volume flow my loops have... Incidentally, my dual GPU computers have Be Quiet! Dark Power Pro P11 750W power supplies which are inaudible under full load, and the triple GPU computer has a Be Quiet! Dark Power Pro P11 1200W power supply which is, as mentioned, loud under full load.
I wouldn't have known the flow too, but when I added the second pump recently, I also added a flow sensor (Flow sensor high flow LT, G1/4 (aquacomputer.de)), because I was curious ;)

So, Windows kept annoying me. The throughput decreased daily and I had issues with F@H (fahcore_22 crashing several times and taking the system down once). Tuesday evening, I installed a second NVMe SSD (250GB 970 Evo Plus) on the AOC-SLG3-2M2 and installed Debian on it. PCIe bifurcation FTW :cool: Well, at least after finding out which slot relates to which port in IIO config :D

Wednesday, I spent most free time on setting everything up, and now it is running DC on Debian :cool: I'm using my "admin" workstation to control F@H and BOINC remotely (from Windows). Works great so far. It was no joy to configure everything on Debian (compared to the Windows experience, but I knew that, as I already have several devices running Debian and FreeBSD). But, after getting it done, it at least works (and I can't say that for the Windows part, on this system)! Now I'm curious to see the throughput over the next few days.

Oh, by the way. Under Windows I used AIDA64 for hardware monitoring. What do you use under Linux? I tried psensors and hardinfo, and I'm not satisfied, to say the least. Both were directly removed after testing them. I can look at a selection of sensors via IPMI, but there's more I'd like to see, e.g. GPU temperature etc.
 

Endgame124

Senior member
Feb 11, 2008
755
477
136
I use CPU-X to monitor CPU stats on my Fedora box, and the nvidia display console to monitor the video card and adjust fan speed. There may be (probably are) better tools, but I set it up and haven't touched it since the last time F@H had a problem and all the WUs crashed.
 
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Thank you! I'll have a look at these two the next time I'm checking on the system.
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
The system's now running without any apparent issues, for four days. The output looks good (roughly what I'd expected from the upgrade). Definitely better than before. Way better. Looking at my stats, the throughput under Windows started mediocre, and then declined within two days so much that it was already close to the old system. Unbelievable.

F@H looks normal right now. I don't see any difference to Windows yet, as the GPU seems to get a lot of small WUs recently.

Overall, I'm very satisfied with running the system under Debian. It would've been great to get everything running under Windows, like before, on the old system, but well... I still haven't looked into the hardware monitoring tools, as the system just runs... For the moment, I enjoy occasionally looking at it remotely via FAHControl and BoincTasks :D
 

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
Rosetta
5_______110,544______voodoo5_6k
10______48,139_______[TA]Assimilator1

Voodoo5, wow, you're racing ahead now!
After my disappointment with Windows I wanted to make a "ballpark" check on the new system. I took the last few days from the free-dc stats because they show my old system, the few days of Windows on the new system and then Debian on the new system. Rejecting outliers (days on which my BOINC scheduler experiment with WCG took more threads), I took the average of daily production on the old system, and sort of normalized against threads and clock speed. Then I included the IPC increases from Ivy Bridge (old system) to Haswell, and from Haswell to Broadwell (new system), and then factored in threads and clock speed again. And guess what, the current production under Debian is almost spot-on, slightly better, and that out of the box. Windows on the new system gave me between 33% (that's way below the old system's output o_O) and 61% of that... So, a full week with 40 threads should give roughly 180,000 for Rosetta.

Preliminary conclusion: Windows can't fully utilize systems with too many threads (supposedly more than 64 threads) and more than one processor group (or NUMA nodes). On the old system (16 threads) I had no issues. The BOINC production was fast and reliable.
 
  • Like
Reactions: Assimilator1

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
[...] I took the average of daily production on the old system, and sort of normalized against threads and clock speed. Then I included the IPC increases from Ivy Bridge (old system) to Haswell, and from Haswell to Broadwell (new system), and then factored in threads and clock speed again. And guess what, the current production under Debian is almost spot-on, slightly better, and that out of the box. [...]
I had/have 14- and 22-core siblings of your processors myself. If I recall correctly, Rosetta@home scaled somewhat sub-linearly when increasing cores × clock but keeping the same overall memory performance. (Or, practically same. My 2×14-core computers had the very same RAM equipment as my 2×22-core computers. Their memory controllers, cache layout, and inner topology are of course subtly different, but this is not making a big difference.) Rosetta@home has got a sizable memory footprint after all; it's not at the extreme end of DC projects, but still considerable. But I don't remember whether I obtained hard figures of this scaling, and if so, whether those can be found somewhere here. However, obviously your new system also has higher memory performance than your old one.
 
  • Like
Reactions: voodoo5_6k

voodoo5_6k

Senior member
Jan 14, 2021
223
202
86
[...]
However, obviously your new system also has higher memory performance than your old one.
Good point :D I went the very easy route in my estimate, as I just wanted to get a ballpark number to see how reasonable the current throughput is ;)

The old system had DDR3 1866 (max. supported config for Ivy Bridge without overclocking), ca. 1,866MT/s and 14,933MB/s vs the new system with DDR4 2400 (also max. supported config), ca. 2,400MT/s and 19,200MB/s.
 

StefanR5R

Elite Member
Dec 10, 2016
4,286
4,868
136
Have you tried F@H in a CPU slot yet, spanning 80 threads? IME FahCore scales very well to large thread counts and across sockets. (Or try 78 threads if you don't want to stop GPU folding.)

Caveats: There may be occasional lack for work for high thread counts, or worse, the client may download work and start it, but FahCore then exits immediately because it cannot "decompose the domain" (I suppose: divide its mesh) suitably for the given thread count. Stepping down to a thread count which can be factored into small primes may then get it going again. (E.g.: 72 = 2^3 * 3^2, or 64 = 2^6.) Running such a big CPU slot continuously requires a lot more attention than a run-of-the-mill GPU slot.
 
  • Like
Reactions: voodoo5_6k

ASK THE COMMUNITY