BOINC: Windows to Linux

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
And while its working OK, how many processes do you have. I am a little confused about all of the above, I am just trying to get a baseline here.

So I have 2 fahclient, one fahcore wrapper, and one fah21 executable. Thats for one video card (a 2060)
Code:
ark@mark-Ryzen1600-linux:~$ ps -ef | grep fah
fahclie+  1791     1  0 Oct05 ?        00:01:06 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+  1793  1791  0 Oct05 ?        00:06:11 /usr/bin/FAHClient --child --lifeline 1791 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+ 19051  1793  0 12:46 ?        00:00:01 /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 705 -lifeline 1793 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
fahclie+ 19055 19051 99 12:46 ?        01:28:43 /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 00 -suffix 01 -version 705 -lifeline 19051 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
mark     19194 19183  0 14:14 pts/0    00:00:00 grep --color=auto fah
mark@mark-Ryzen1600-linux:~$
Here's what it says:
Code:
ga7pxsl@ga7pxsl-GA-7PXSL:~$ ps -ef | grep fah
ga7pxsl   4083     1  0 Oct06 ?        00:00:17 /usr/bin/python3 /usr/bin/gdebi-gtk /home/ga7pxsl/Downloads/fahclient_7.5.1_amd64.deb
fahclie+  4356     1  0 Oct06 ?        00:00:29 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+  4358  4356  0 Oct06 ?        00:02:34 /usr/bin/FAHClient --child --lifeline 4356 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+  9270  4358  0 14:34 ?        00:00:04 /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 705 -lifeline 4358 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
fahclie+  9274  9270 99 14:34 ?        03:58:03 /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 705 -lifeline 9270 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
ga7pxsl  10401  3624  0 18:30 pts/0    00:00:00 grep --color=auto fah
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,260
16,118
136
Here's what it says:
Code:
ga7pxsl@ga7pxsl-GA-7PXSL:~$ ps -ef | grep fah
ga7pxsl   4083     1  0 Oct06 ?        00:00:17 /usr/bin/python3 /usr/bin/gdebi-gtk /home/ga7pxsl/Downloads/fahclient_7.5.1_amd64.deb
fahclie+  4356     1  0 Oct06 ?        00:00:29 /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+  4358  4356  0 Oct06 ?        00:02:34 /usr/bin/FAHClient --child --lifeline 4356 /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon
fahclie+  9270  4358  0 14:34 ?        00:00:04 /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 705 -lifeline 4358 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
fahclie+  9274  9270 99 14:34 ?        03:58:03 /var/lib/fahclient/cores/cores.foldingathome.org/Linux/AMD64/NVIDIA/Fermi/Core_21.fah/FahCore_21 -dir 01 -suffix 01 -version 705 -lifeline 9270 -checkpoint 15 -gpu-vendor nvidia -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu 0
ga7pxsl  10401  3624  0 18:30 pts/0    00:00:00 grep --color=auto fah
Your download is still running for the new client ? Other than that, we are the same.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
Okay, Linux box #2 up and running, not without problems, but both BOINC and F@H are working. There's really only two more boxes to go and that's all the headless machines.
 
  • Like
Reactions: Markfw and biodoc

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
So last night I installed Mint on a very old dual Xeon rig (2x E5440) and amazingly, it seems to be finishing WUs even faster than my new CPUs. I have to wonder if there are distinctions within the Universe BHspin v2 0.19 WUs, it just doesn't make sense.

I plan on adding a GPU to this rig, is there a special way of going about it? I don't even think there is anything other than a generic VGA driver in play right now, it's running 800x600 with the little onboard video chip.
 

StefanR5R

Elite Member
Dec 10, 2016
6,694
10,607
136
In 2018, Universe@home ran notably faster on Linux than on Windows. The currently active applications are from 2019, and I haven't been following what the relevant changes were. But according to your observation, there still appears to be a Linux advantage at U@h. Top hosts by RAC at U@h are a mix of Linux and Windows hosts.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
In 2018, Universe@home ran notably faster on Linux than on Windows. The currently active applications are from 2019, and I haven't been following what the relevant changes were. But according to your observation, there still appears to be a Linux advantage at U@h. Top hosts by RAC at U@h are a mix of Linux and Windows hosts.
The Core2 is keeping up with or beating a Haswell of similar clockspeed, both in Linux.

Wait a minute. The Core2 is not using HT. That's the difference. So a Core2 "real" core is just about as fast as one logical Haswell core.
 
Last edited:

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
Back to the FAH just stopping problem, it has happened again, and I paid more attention this time. When it happens, the client does not respond to restart or even the normal kill command. It does respond to "kill -9" which I think means that it is freezing up occasionally. After "kill -9" I can start it and it works for an indeterminate amount of time.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
So when I set up and installed BOINC on the 2x E5440 box, I was apparently still in the pre-install environment. I found this out because I shut it down to install a GPU and realized the stick was still installed. I pulled it out while it was off, and upon booting, everything I had installed looked gone. Putting it back in and rebooting did not help. Poking around the file system, it was not obvious if any of the work was somewhere I could retrieve it, but there are a bunch of WUs lost right now, some completed.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,260
16,118
136
Back to the FAH just stopping problem, it has happened again, and I paid more attention this time. When it happens, the client does not respond to restart or even the normal kill command. It does respond to "kill -9" which I think means that it is freezing up occasionally. After "kill -9" I can start it and it works for an indeterminate amount of time.
When you say restart is the the "sudo service FAHClient stop" command, or a reboot ? I have found the reboot is the only way,
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
When you say restart is the the "sudo service FAHClient stop" command, or a reboot ? I have found the reboot is the only way,
I want to make sure I am right before saying anything more. Sometimes I go too fast to try to fix things, and I'm not carefully documenting what's done. But I think the process is becoming non-responsive, which a reboot would fix. Also using 'kill -9 <pid>' works the same, just like 'end task' in Windows Task Manager.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,260
16,118
136
I want to make sure I am right before saying anything more. Sometimes I go too fast to try to fix things, and I'm not carefully documenting what's done. But I think the process is becoming non-responsive, which a reboot would fix. Also using 'kill -9 <pid>' works the same, just like 'end task' in Windows Task Manager.
Ah... Probably right. As a linux noob myself, I use the reboot,, as it magically fixes almost everything. Its been happening more the last few months, either hung or database locked or whatever, but the folding forum has need less than helpful, so my replies are "what seems to work". Even now with 14 machines folding, I only have to mess with one about every week on average. I use FAHControl on this main machine to track all the remote boxes. Works great !

Like this:
7vbymAZ.png
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
@Markfw , here is how I determined that the FAHClient is probably becoming non-responsive:
Code:
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
 1743 ?        00:00:28 FAHClient
 8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient stop
[sudo] password for c7x99:         
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
 1743 ?        00:00:28 FAHClient
 8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo kill  -9 1743
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
 8591 ?        00:00:01 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient start
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
 8591 ?        00:00:02 FAHControl
 8691 ?        00:00:00 FAHClient
 8693 ?        00:00:00 FAHClient
c7x99@c7x99-C7X99-OCE-F:~$

After issuing the 'kill -9' command to the correct PID and then restarting FAHClient, it starts to work again.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,260
16,118
136
@Markfw , here is how I determined that the FAHClient is probably becoming non-responsive:
Code:
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
1743 ?        00:00:28 FAHClient
8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient stop
[sudo] password for c7x99:        
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
1743 ?        00:00:28 FAHClient
8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo kill  -9 1743
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
8591 ?        00:00:01 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient start
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
8591 ?        00:00:02 FAHControl
8691 ?        00:00:00 FAHClient
8693 ?        00:00:00 FAHClient
c7x99@c7x99-C7X99-OCE-F:~$

After issuing the 'kill -9' command to the correct PID and then restarting FAHClient, it starts to work again.
Too much typing for me (lazy) it say restart and come back a few minutes later to check on it.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
Too much typing for me (lazy) it say restart and come back a few minutes later to check on it.
Well, once you have typed it in once, you can just up arrow until you get to the right command. But restarting is easy and pretty much reflexive coming from a Windows universe.
 
  • Like
Reactions: Markfw

biodoc

Diamond Member
Dec 29, 2005
6,338
2,243
136
So when I set up and installed BOINC on the 2x E5440 box, I was apparently still in the pre-install environment. I found this out because I shut it down to install a GPU and realized the stick was still installed. I pulled it out while it was off, and upon booting, everything I had installed looked gone. Putting it back in and rebooting did not help. Poking around the file system, it was not obvious if any of the work was somewhere I could retrieve it, but there are a bunch of WUs lost right now, some completed.

Did you try booting from the USB stick?
 

biodoc

Diamond Member
Dec 29, 2005
6,338
2,243
136
@Markfw , here is how I determined that the FAHClient is probably becoming non-responsive:
Code:
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
1743 ?        00:00:28 FAHClient
8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient stop
[sudo] password for c7x99:        
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
1743 ?        00:00:28 FAHClient
8591 ?        00:00:00 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo kill  -9 1743
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
8591 ?        00:00:01 FAHControl
c7x99@c7x99-C7X99-OCE-F:~$ sudo service FAHClient start
c7x99@c7x99-C7X99-OCE-F:~$ ps -A | grep FAH
8591 ?        00:00:02 FAHControl
8691 ?        00:00:00 FAHClient
8693 ?        00:00:00 FAHClient
c7x99@c7x99-C7X99-OCE-F:~$

After issuing the 'kill -9' command to the correct PID and then restarting FAHClient, it starts to work again.

Interesting. Next time this happens, before you kill the process, check/save the log file to see if there are any clues there. With the log file and the info above, you can post over at the folding forum.
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
There are still issues with the server(s) at Temple University. The problems are intermittent so they are having difficulty tracking down the cause(s).
In my case, the log file has never mentioned any problems up- or downloading, so not sure if my problem is related to the server issue. In the meantime, is there a better method to keep tabs on these GPUs and be warned when they fall idle?
 

biodoc

Diamond Member
Dec 29, 2005
6,338
2,243
136
In my case, the log file has never mentioned any problems up- or downloading, so not sure if my problem is related to the server issue. In the meantime, is there a better method to keep tabs on these GPUs and be warned when they fall idle?

Not that I know of offhand. Maybe there's a client option that might help. FAHClient --help will dump out a ton of info that may be worth going through carefully. I haven't seen this issue but I shut down folding before the start of the sprint. I'll be back up in a few days. I'll ask some annoying OS related questions.

I'm running Mint 19.2 with nvidia driver version 430.50

uname -r will give you kernel version. Mine is 4.15.0-65-generic

Have you done updates post installation?

sudo apt update <-----checks the repository for package updates.
sudo apt upgrade <-----installs the package updates (reboot to be safe; kernel upgrades will need a reboot)

There's a shield icon (blue center, white outline) that at a glance in the right part of the screen/task bar, will let you know of system updates (exclamation point means updates are available; check mark means up-to-date).

Check for dependencies for FAHClient and FAHCoreWrapper. I don't think this is the problem but check anyway.

ldd /usr/bin/FAHClient

ldd /usr/bin/FAHCoreWrapper

Code:
mark@x20-linux:/usr/bin$ ldd FAHClient
    linux-vdso.so.1 (0x00007fff58fbb000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa49cd6f000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa49cb6b000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fa49c94e000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa49c5c5000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa49c227000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa49c00f000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa49bc1e000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fa49d9e8000)
mark@x20-linux:/usr/bin$ ldd FAHCoreWrapper
    linux-vdso.so.1 (0x00007ffd40be4000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f06c6694000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f06c6490000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f06c6273000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f06c5eea000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f06c5b4c000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f06c5934000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f06c5543000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f06c6be5000)
 

crashtech

Lifer
Jan 4, 2013
10,695
2,294
146
@biodoc, checking the dependencies did not reveal any anomalies. I'm updating them now, we'll see if that does anything. Thanks!
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,694
10,607
136
is there a better method to keep tabs on these GPUs and be warned when they fall idle?
A room thermometer? :-)

If FAHControl doesn't show it that fahcore is stuck, then you could run something like nvidia-smi dmon -d 30 instead.