After months, I just tried Folding@Home, and it's desolate.

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
I decided to switch on a dual GPU computer to get one of my rooms warmer, and I thought I would use Folding@Home as the heat source.

Well, the first four workunits have finished, and their upload transfers always fail after a random percentage (anywhere between 3 % and 30 %, mostly at circa 10%).

Each workunit alternates between two collection servers:
WU00, 38.22MiB: 129.32.209.200 (vav17.fah.temple.edu) and 131.239.113.97 (fahwork01.psivant.com)
WU01, 38.24MiB: 129.32.209.200 (vav17.fah.temple.edu) and 129.32.209.206 (vav23.fah.temple.edu)
WU02, 34.30MiB: 131.239.113.97 (fahwork01.psivant.com) and 128.174.73.74 (ds01.scs.illinois.edu)
WU03, 34.27MiB: 131.239.113.97 (fahwork01.psivant.com) and 128.174.73.78 (ds03.scs.illinois.edu)

I suppose it's too hard to ship more than 10 MBytes in a row over the Atlantic.

A quick look at the foldingforum shows me several inconclusive older reports of upload failures from various points in 2023 and 2022.

I am switching to a different project.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,606
14,587
136
WOW. Totally different picture here. Here is a small except from one log file: (~20 seconds total upload time)

14:08:51:WU01:FS00:Upload 12.11%
14:08:51:WU00:FS00:0x22:Attempting to create CUDA context:
14:08:51:WU00:FS00:0x22: Configuring platform CUDA
14:08:52:WU00:FS00:0x22: Using CUDA and gpu 0
14:08:52:WU00:FS00:0x22:Completed 0 out of 5000000 steps (0%)
14:08:52:WU00:FS00:0x22:Checkpoint completed at step 0
14:08:57:WU01:FS00:Upload 38.96%
14:09:03:WU01:FS00:Upload 65.64%
14:09:09:WU01:FS00:Upload 88.06%
14:09:12:WU01:FS00:Upload complete
 

BurnedOut2

Junior Member
Jul 13, 2023
5
5
41
I'm having intermittent slow uploads and retries over the last few days. Historically, there's a bit of a lack of the babysitting of servers over academic semester breaks.
 
  • Like
Reactions: q52

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
Totally different picture here.
I suppose you need a short route or/and good end-to-end speed in the first place, to make it unlikely that these servers drop the connection eventually like a hot potato.

My four uploading WUs are still uploading… one attempt made it until 68 % percent. I'll give them another day or two. (They expire two days after they were assigned to the client.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,606
14,587
136
I suppose you need a short route or/and good end-to-end speed in the first place, to make it unlikely that these servers drop the connection eventually like a hot potato.

My four uploading WUs are still uploading… one attempt made it until 68 % percent. I'll give them another day or two. (They expire two days after they were assigned to the client.)
I used to have 400 down/10 upload. I now have 300/300 speed. (direct fiber) So, yes, this could be why I don't see any issues.
 

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
One of my four results managed to get uploaded overnight.

Edit, and just now, another result *almost* made it.
07:25:35:WU03:FS00:Upload 94.47%
07:26:06:WARNING:WU03:FS00:Exception: Failed to send results to work server: 10002: Received short response, expected 512 bytes, got 0

Apart from this, it's still the same: Either an upload attempt cannot make a connection in the first place, or the connection is dropped at some point along the transfer.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
Two more results went through by now. Only one result left to upload (of all four which I dared to compute yesterday).

Hip hip hurray for Germany's 3rd world internet!

Corresponding threads at the foldingforum suggested the use of VPNs, but I am not inclined to go that route.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
The last result went through last night. Maybe the problem is discriminating routing. Or maybe it's the high latency of my connection (>400 ms average and >1000 ms max during uploads).

Here are results of my home connection with https://speed.cloudflare.com/ towards a server in Berlin; I get basically the same with a server in Munich.
Screenshot_20231201_074108.png
Screenshot_20231201_074142.png
Screenshot_20231201_074232.png
Screenshot_20231201_074407.png
(Another try came up with 0% packet loss, 1000/1000 received.)

Last time I checked, other locally available providers offered slower upload for more money. Need to check again.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
At my workplace, latency during upload is ~80 ms avg/ ~100 ms max, which includes a VPN tunnel from the branch office to the main office across 1/4 of Germany. Not quite great, but an order of magnitude better than at home.

I looked at current offerings of other ISPs who reach my home either via cable or via DSL. Tariffs at my current price level are rare and come with considerably less bandwidth. Tariffs with comparably bandwidth (same or half of the download bandwidth, same or >double the upload bandwidth) start in a price bracket which is ~1.5 times of my current one. I don't like that. Still, I might try to find out about the other ISPs' quality of service, such as reliability and latencies.

Next year or so, fibre will supposedly be laid in a main street near my home; I don't know if our house on a parallel minor street will be connected too.
 
  • Like
Reactions: Markfw

q52

Member
Jan 18, 2023
68
36
51
I had a similar experience. I built a dual GPU system earlier this year with the intention of using it for BOINC, but by about August the GPU jobs started drying up and only came back infrequently. Not sure what the exact situation is with F@H but it's not reliable enough to keep my workstation running 24/7 just for this.
 

q52

Member
Jan 18, 2023
68
36
51
I'm having intermittent slow uploads and retries over the last few days. Historically, there's a bit of a lack of the babysitting of servers over academic semester breaks.
Yeah this is important to keep in mind. These usually aren't commercial servers & services, sometimes they're just a tower in the corner of a professor's office. There have been months long outages in the past for various projects due to things as simple as a single HDD failure.
 

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
[34…38 MB sized] upload transfers always fail after a random percentage (anywhere between 3 % and 30 %, mostly at circa 10%).
maybe it's the high latency of my connection (>400 ms average and >1000 ms max during uploads).
The ISP seems to have fixed something in the meantime. Latency during uploads is now at 80 ms avg, 100 ms max. Transfers to PrimeGrid with 30 MB payload go through without issue. I haven't retried Folding@Home yet.
 

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
It's no use.

I made two results today but can't get them sent. The client repeatedly reports the transfer as failed, after a random upload percentage.

Yet my connection quality is still as reported in #15 when measured with https://speed.cloudflare.com/.

In addition, I took a traceroute to two work servers and one collection server of F@H, while there was no transfer was going on. The traceroutes showed ~16…20 ms when it got to Frankfurt, and jumped to ~105…110 ms on the very next step when it got to either Boston or Philadelphia. From there, latency practically didn't get any higher when the last responding hosts in the routes were reached (hosts at lightower.net in the route to one of the work servers: 131.239.113.97, and at upenn.edu in the routes to the other work server: 158.130.118.24 and the collection server: 158.130.118.26).

Going by the cloudflare measurements, I don't see that my situation would improve if I booked a VDSL link instead of my current DOCSIS link.

So, uploading 30 MB to PrimeGrid is fine. Repeatedly. And sometimes several of those in parallel. But getting 24 and 35 MB uploaded to F@H, even just one at a time, is near impossible.
 
  • Wow
Reactions: Markfw

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,606
14,587
136
It's no use.

I made two results today but can't get them sent. The client repeatedly reports the transfer as failed, after a random upload percentage.

Yet my connection quality is still as reported in #15 when measured with https://speed.cloudflare.com/.

In addition, I took a traceroute to two work servers and one collection server of F@H, while there was no transfer was going on. The traceroutes showed ~16…20 ms when it got to Frankfurt, and jumped to ~105…110 ms on the very next step when it got to either Boston or Philadelphia. From there, latency practically didn't get any higher when the last responding hosts in the routes were reached (hosts at lightower.net in the route to one of the work servers: 131.239.113.97, and at upenn.edu in the routes to the other work server: 158.130.118.24 and the collection server: 158.130.118.26).

Going by the cloudflare measurements, I don't see that my situation would improve if I booked a VDSL link instead of my current DOCSIS link.

So, uploading 30 MB to PrimeGrid is fine. Repeatedly. And sometimes several of those in parallel. But getting 24 and 35 MB uploaded to F@H, even just one at a time, is near impossible.
I really wish I knew something about the technical side of networks, so I could help you, but my hardware knowledge is how to assemble PCs, and thats about it.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,606
14,587
136
PS, I've got fahclient_7.6.21_amd64.deb, i.e. the latest according to https://foldingathome.org/start-folding/, on Linux Mint 21.1.
This is linux mint 20.3 from ps -ef. Not sure how to relate that to your client version.

root 214762 1369 0 13:35 ? 00:00:00 /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.
root 214766 214762 99 13:35 ? 00:29:40 /var/lib/fahclient/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23 -dir 01 -
 

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
It's FAHClient which is doing the data transfers. FAHCore only performs the computation. FAHControl's "System Info" tab shows the client version, for instance.
 

biodoc

Diamond Member
Dec 29, 2005
6,263
2,238
136
There is a beta FAHClient. This post has links to download and the changelog. Just a work of warning: FAHControl does not work with the beta client.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,606
14,587
136
There is a beta FAHClient. This post has links to download and the changelog. Just a work of warning: FAHControl does not work with the beta client.
most likely it does not support python3, which there is a version out there. I have it, but have no idea how to send it to @StefanR5R .

python3-fahcontrol_7.7.0-1_all.deb is the name.
 

mmonnin03

Senior member
Nov 7, 2006
217
220
116
Have you tried a VPN? GPUGrid for example used to suck for us across the pond. I'd get timeouts and slow speeds but switching to an EU server would seem fine.
 
  • Love
Reactions: Markfw

StefanR5R

Elite Member
Dec 10, 2016
5,552
7,915
136
I have never used a VPN yet, apart from the corporate VPN at work which is transparent to me. I'll might check it out eventually. (IIRC this has been suggested at the foldingforum for such cases too, but I don't recall whether or not there were success reports.) Though my motivation is sub zero for the time being. As you know I sometimes enjoy figuring out stuff. But not now and this.

If their IT discriminates between origins of inbound traffic, they probably have their reasons. High utilization of their upload servers could be one. If they have high utilization, then that's good for them actually. But for myself as potential contributor, that'd be a signal to turn to projects at which the sole bottleneck is client-side computing throughput.