BOINC: Windows to Linux

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

biodoc

Diamond Member
Dec 29, 2005
6,326
2,241
136
Link to Kernel Dump:

I'm not sure if it's a driver issue or a hardware issue. Hopefully @StefanR5R can help diagnose.

NVRM: GPU at 0000:02:00.0 has fallen off the bus.
Then issues with the other GPU.
Then:
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded

Code:
[  932.512994] NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
[  932.512995] NVRM: GPU at 0000:02:00.0 has fallen off the bus.
[  932.512996] NVRM: GPU is on Board .
[  932.512998] pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004020/00000000
[  932.513001] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[  932.513003] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[  932.513005] pcieport 0000:00:03.0: broadcast error_detected message
[  932.513007] pcieport 0000:00:03.0: AER: Device recovery failed
[  932.513008] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error received: id=0018
[  932.513012] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0018(Requester ID)
[  932.513015] pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004020/00000000
[  932.513017] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[  932.513019] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[  932.513023] pcieport 0000:00:03.0: broadcast error_detected message
[  932.513024] pcieport 0000:00:03.0: AER: Device recovery failed
[  932.513025] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Fatal) error received: id=0018
[  932.513030] pcieport 0000:00:03.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0018(Requester ID)
[  932.513032] pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00004020/00000000
[  932.513034] pcieport 0000:00:03.0:    [ 5] Surprise Down Error   
[  932.513036] pcieport 0000:00:03.0:    [14] Completion Timeout     (First)
[  932.513038] pcieport 0000:00:03.0: broadcast error_detected message
[  932.513040] nvidia 0000:02:00.0: device has no AER-aware driver
[  932.513041] snd_hda_intel 0000:02:00.1: device has no AER-aware driver
[  932.513206] NVRM: A GPU crash dump has been created. If possible, please run
               NVRM: nvidia-bug-report.sh as root to collect this data before
               NVRM: the NVIDIA kernel module is unloaded.
 

StefanR5R

Elite Member
Dec 10, 2016
6,550
10,288
136
A pro pos Linux science applications, for owners of Windows hosts:

There is the "Windows Subsystem for Linux" (WSL), a facility of the Windows operating system which provides Linux syscalls, such that a Linux userspace environment can be operated on top of the Windows operating system.¹

First of all, this subsystem could be used to install the Linux version of boinc-client on a Windows computer. Has anybody ever tried this?

Second, recent versions of boinc-client are able to detect which Linux distribution(s) is/are installed on top of WSL, and are reporting this to the boinc server. A boinc server should then be able to assign Linux work to a Windows host. However, I don't know whether or not BOINC project admins need to define extra platforms to make use of this. If yes, then my guess is that nobody did it yet.

________
¹) The counterpart is Wine which provides Windows system calls and low-level runtime libraries on Unix-like systems, such that many Windows userspace applications can run on Unix-like OSs.
 

StefanR5R

Elite Member
Dec 10, 2016
6,550
10,288
136
Right, GPGPU applications can't be run this way, but all the CPU-ony applications should work.

Further, in WSL 2 they changed the implementation to a full Linux kernel running on HyperV. This will probably exclude vboxwrapper based science applications, but running those through WSL would be a weird idea anyway. WSL 2 apparently enables Linux GUIs, but certainly still not any GPGPU applications.
 

crashtech

Lifer
Jan 4, 2013
10,681
2,277
146
So, maybe you can use the GPU with WSL 2?:

 

StefanR5R

Elite Member
Dec 10, 2016
6,550
10,288
136
Oh, I stand corrected. But this solution does not appear to bring the Linux GPGPU performance advantage. According to an infographic on this page…
https://developer.nvidia.com/cuda/wsl
…the layers are Nvidia CUDA driver --> Linux kernel --> GPU paravirtualization --> Nvidia Windows GPU driver --> Windows kernel --> hardware. That is, the overhead which comes with Windows' video driver architecture is still there, and then comes the paravirtualization overhead on top.

(This is in contrast to GPU pass-through, which is sort of possible with Hyper-V, but apparently with a tremendous amount of caveats.)
 

crashtech

Lifer
Jan 4, 2013
10,681
2,277
146
Oh, that's not helpful, then... But CPU apps might still benefit? I have some PCs that retain Windows only because of their old AMD GPUs; I can't figure out how to make those work in Linux. So theoretically those could run CPU projects on Linux when needed, using WSL.

Edit: I'm installing WSL now on my main desktop. Will advise.
 
Last edited:

crashtech

Lifer
Jan 4, 2013
10,681
2,277
146
I have LLR-DIV running on Windows and WSL (Ubuntu within Windows) Each BOINC instance (one Windows, one Linux) is running 6 tasks of 1 thread each. Process Lasso is being used in Windows to "disable" SMT for both the llr executable and vmmem, though I'm not sure atm if that's desirable or not. So far, it runs. It might be worthwhile to do some offline testing after the challenge.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,550
10,288
136
WSL would be useful to run projects which don't have Windows work, or ones whose applications have much better performing binaries on Linux (Universe, Smash Childhood Cancer).

PrimeGrid's multithreaded LLR may perhaps have a small advantage on Linux over Windows due to a better process scheduler in Linux and generally fewer system bloat. If such an advantage exists, it may not be present with WSL which still has got the Windows kernel underneath and the Windows userland alongside.
 

crashtech

Lifer
Jan 4, 2013
10,681
2,277
146
I imagined that Hyper-V would be able to bypass a lot of Windows overhead. Perhaps not. I'll have to wait for some long WUs to validate before deciding whether or not to continue running LLR-DIV in WSL.
 

crashtech

Lifer
Jan 4, 2013
10,681
2,277
146
Taking an average of 10 completed long WUs from each instance shows a difference of less than 1% in favor of native Windows. It's a tiny difference but it suggests that this technique should be limited to projects which show a clear performance advantage in Linux.