F@H checkpoints

Athlex

Golden Member
Jun 17, 2000
1,258
2
81
The F@H client allows partial progress on a WU to be saved and the default interval is 15 minutes. Is there any reason to change this? Is saving this progress disk or CPU intensive?
 

BlackMountainCow

Diamond Member
May 28, 2003
5,759
0
0
Nope, not really. It just writes a couple of files to you HDD and that's it. And if you shut down F@H, it'll save the current progress anway. :)
 

Athlex

Golden Member
Jun 17, 2000
1,258
2
81
Originally posted by: BlackMountainCow
Nope, not really. It just writes a couple of files to you HDD and that's it. And if you shut down F@H, it'll save the current progress anway. :)


Good to know. I started using the GPU client and it seems to drop out a lot, so I'll just set it to write to the disk more.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: BlackMountainCow
And if you shut down F@H, it'll save the current progress anway.

Not true BMC.

For most cores, progress is saved after every frame (percent) and after a checkpoint.

If the client is set to checkpoint every 15 minutes and you shut down after 14.9 you will lose 14.9 minutes of crunching.

The checkpoint is not very CPU or disk intensive, but I wouldn't set it to every few seconds or minutes for that matter.

The disk will get defragmented if set to checkpoint too often.
 

Athlex

Golden Member
Jun 17, 2000
1,258
2
81
Originally posted by: GLeeM
For most cores, progress is saved after every frame (percent) and after a checkpoint.

Good to know, I thought it only wrote at the checkpoint time and when a WU completes.
 

BlackMountainCow

Diamond Member
May 28, 2003
5,759
0
0
@GLeeM: You sure? I didn't know that. It's because if I shut down F@H CLI by hand and restarted it later on, it often resumed with something like "completed 23248 of 30000 steps". Because of the odd number I thought it saves the current work when you exit F@H. But you do more F@H than I do, so I'd rather say you know better :)

@Athlex: Sorry for the wrong info then :eek:
 

Athlex

Golden Member
Jun 17, 2000
1,258
2
81
Originally posted by: BlackMountainCow
@Athlex: Sorry for the wrong info then :eek:

Heh, no worries. I've been using the client for years and I'm still figuring out how everything works. :)
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: BlackMountainCow
@GLeeM: You sure? I didn't know that. It's because if I shut down F@H CLI by hand and restarted it later on, it often resumed with something like "completed 23248 of 30000 steps". Because of the odd number I thought it saves the current work when you exit F@H. But you do more F@H than I do, so I'd rather say you know better :)

@Athlex: Sorry for the wrong info then :eek:

LOL, I had the exact same thinking about "completed 23248 of 30000 steps"

Until I checked the time of the next frame.
 

xfalconx2

Member
Oct 2, 2006
53
0
0
Originally posted by: GLeeM
Originally posted by: BlackMountainCow
@GLeeM: You sure? I didn't know that. It's because if I shut down F@H CLI by hand and restarted it later on, it often resumed with something like "completed 23248 of 30000 steps". Because of the odd number I thought it saves the current work when you exit F@H. But you do more F@H than I do, so I'd rather say you know better :)

@Athlex: Sorry for the wrong info then :eek:

LOL, I had the exact same thinking about "completed 23248 of 30000 steps"

Until I checked the time of the next frame.
But, how would I get this then? I exited last night and opened it up this morning. Yeah, they both say 64%, but that is just the program rounding up.

[16:08:19] Completed 31921356 out of 50000000 steps (64)
[16:08:19] Extra SSE boost OK.
[16:16:03] Writing local files
[16:16:03] Completed 32000000 out of 50000000 steps (64)

 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: xfalconx2
But, how would I get this then? I exited last night and opened it up this morning. Yeah, they both say 64%, but that is just the program rounding up.

[16:08:19] Completed 31921356 out of 50000000 steps (64)
[16:08:19] Extra SSE boost OK.
[16:16:03] Writing local files
[16:16:03] Completed 32000000 out of 50000000 steps (64)

Yup, this is normal for a partially completed frame.

If you check how long the other frames took and then how long frame 64 took, you will see that #64 took longer than the others by the amount of time that it had crunched past the previous checkpoint or frame before shutting it down. If it is a Gromacs core WU, not sure about others. I think Amber or one of the others may be different) :confused:

Hey, I tried arguing with Bruce over at the Folding forum that the client saved after a shutdown (because of "Completed 31921356 out of 50000000 steps") and then had to eat crow after he asked me to check the times :eek:
 

xfalconx2

Member
Oct 2, 2006
53
0
0
That's odd. My times all look normal. Could it possibly be something new with newer versions?
[16:08:19] Completed 31921356 out of 50000000 steps (64)
[16:08:19] Extra SSE boost OK.
[16:16:03] Writing local files
[16:16:03] Completed 32000000 out of 50000000 steps (64)
[17:04:25] Writing local files
[17:04:25] Completed 32500000 out of 50000000 steps (65)
[17:52:19] Writing local files
[17:52:19] Completed 33000000 out of 50000000 steps (66)
[18:39:38] Writing local files
[18:39:38] Completed 33500000 out of 50000000 steps (67)
[19:25:29] Writing local files
[19:25:29] Completed 34000000 out of 50000000 steps (68)
[20:12:14] Writing local files
[20:12:14] Completed 34500000 out of 50000000 steps (69)
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: xfalconx2
That's odd. My times all look normal. Could it possibly be something new with newer versions?
[16:08:19] Completed 31921356 out of 50000000 steps (64)
[16:08:19] Extra SSE boost OK.
[16:16:03] Writing local files
[16:16:03] Completed 32000000 out of 50000000 steps (64)
[17:04:25] Writing local files
[17:04:25] Completed 32500000 out of 50000000 steps (65)
[17:52:19] Writing local files
[17:52:19] Completed 33000000 out of 50000000 steps (66)
[18:39:38] Writing local files
[18:39:38] Completed 33500000 out of 50000000 steps (67)
[19:25:29] Writing local files
[19:25:29] Completed 34000000 out of 50000000 steps (68)
[20:12:14] Writing local files
[20:12:14] Completed 34500000 out of 50000000 steps (69)

What is the project number of this WU?

And could you show a couple frames from before you shut down?

Did you change from the default 15 minute checkpoint?
 

xfalconx2

Member
Oct 2, 2006
53
0
0
I have the checkpoint time set to 20 minutes. This is the console client that I have running as a service.

Project: 2126 (Run 93, Clone 32, Gen 5)
[05:00:32] Completed 30000000 out of 50000000 steps (60)
[05:48:06] Writing local files
[05:48:06] Completed 30500000 out of 50000000 steps (61)
[06:35:33] Writing local files
[06:35:33] Completed 31000000 out of 50000000 steps (62)
[07:21:55] Writing local files
[07:21:55] Completed 31500000 out of 50000000 steps (63)
[08:06:21]
[08:06:21] Folding@home Core Shutdown: INTERRUPTED
[08:06:24] CoreStatus = 66 (102)
[08:06:24] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
It looks like the shutdown caused it to take just a little longer than my normal 48ish minutes.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: xfalconx2
I have the checkpoint time set to 20 minutes. This is the console client that I have running as a service.

Project: 2126 (Run 93, Clone 32, Gen 5)
[05:00:32] Completed 30000000 out of 50000000 steps (60)
[05:48:06] Writing local files
[05:48:06] Completed 30500000 out of 50000000 steps (61)
[06:35:33] Writing local files
[06:35:33] Completed 31000000 out of 50000000 steps (62)
[07:21:55] Writing local files
[07:21:55] Completed 31500000 out of 50000000 steps (63)
[08:06:21]
[08:06:21] Folding@home Core Shutdown: INTERRUPTED
[08:06:24] CoreStatus = 66 (102)
[08:06:24] + Shutdown requested by user. Exiting.
Folding@Home Client Shutdown.
It looks like the shutdown caused it to take just a little longer than my normal 48ish minutes.

You shut it down four minutes after the second checkpoint (44 minutes into the frame).

It took 8 more minutes to finish. 44 + 8 = 52 If it had saved on shutdown it should only have taken four minutes.

You lost those four minutes because it did not save at shutdown.
 

xfalconx2

Member
Oct 2, 2006
53
0
0
Ok, so it is not doing anything out of the ordinary or contradictory to what you originally said. Sorry for wasting time.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: xfalconx2
Sorry for wasting time.

Not at all.

It is good to prove that the newer core is still doing what I thought. And you proved it with the posts, Thanks!

Anyone have data on cores other than Gromacs (FahCore_78.exe)? I think it was Amber that behaves differently?