F@H: Some huge Animals in the wild

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
I don't know about you guys, but I've got 2 machines crunching these HUGE Gromacs units. I thought it was an aberration, but a machine that normally crunches the standard ones in about 20-40 hours has been eating away at one that it's gonna take it 101 hours and is taking it a little over an hour for each frame.

It normally takes it about 13 minutes per frame.

I rebooted the machine but when it started off from it's last checkpoint...it was still at 1+ hour per sub-unit, with estimated completion time in 101 hours.

My other machine just got one this morning that is the same size.

these things better be worth the points.

what is the URL for the site that has the point values for the WU's?

Martin
 

ProviaFan

Lifer
Mar 17, 2001
14,993
1
0
Project Summary

What are the specs on those machines? My 800MHz Duron is taking 2 hours 20 mins per frame on a p269_h2o (146 point) work unit. :(

The 1.6GHz Athlon XP is much faster, naturally. ;)
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Originally posted by: jliechty
Project Summary

What are the specs on those machines? My 800MHz Duron is taking 2 hours 20 mins per frame on a p269_h2o (146 point) work unit. :(

The 1.6GHz Athlon XP is much faster, naturally. ;)

One is a 1400+ maybe 1600+
the other is either a 1400+ or a 1700+
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Yea, I think one of the WU is a h2o one and the other is a ch*** one. they are HUGE!!!. They should be equivalent values as the other ones, but will just take longer to credit for them. So my output will probably drop the next few days until they are processed. :(

Thanks for the URL.
 

ICXRa

Diamond Member
Jan 8, 2001
5,924
0
71
Yeah I have been getting a few of these. One on a PIII 450, I keep thinking it was locked up because it was taking hours to complete a frame and a week to complete the WU.
 

CRXican

Diamond Member
Jun 9, 2004
9,062
1
0
Glad I read this thread. I'm new to folding and wonderd why I had been working the same unit all damn day, now I know. Also just started Folding on my main rig along with the first one I was Folding on.
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Originally posted by: CRXican
Glad I read this thread. I'm new to folding and wonderd why I had been working the same unit all damn day, now I know. Also just started Folding on my main rig along with the first one I was Folding on.

good to hear, CRXican. We can use all the help we can get to have us some beef at the end of the month. I've got first dibs on the tenderloins.


:)

P.S. I'm glad I'm not the only one crunching these beasts. They make me nervous...but you know what they say, the bigger they are, the harder they'll fall.


happy crunching
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
I don't see p269_h2o on psummary page.

trevinom, what is the p*** number on the ones you are crunching?
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Originally posted by: GLeeM
I don't see p269_h2o on psummary page.

trevinom, what is the p*** number on the ones you are crunching?

They have changed them since this morning. Both of the ones I am currently crunching arep263_chcl3. But I don't see it listed on the current summary...weird.
 

osage

Diamond Member
Jul 16, 2000
5,686
0
76
yes some real beasts, been getting them for a week. 5 out of 6 rigs have them right now.

sample from a 2200 mhz Barton rig

Protein p266_ch3cn
Protein Core Gromacs
Credit 152.00
Deadline 30.00
Current Frame 58 of 100 (42 left )
Time Per Frame 17 mins, 26 sec
Time Left 12 hours, 12 mins


another one from a 2200 mhz XP rig

Protein p268_ch3oh
Protein Core Gromacs
Credit 206.00
Deadline 39.00
Current Frame 70 of 100 (30 left )
Time Per Frame 25 mins, 29 sec
Time Left 12 hours, 44 mins


and one more from another 2200 mhz Barton rig

Protein p262_ch3cn
Protein Core Gromacs
Credit 152.00
Deadline 30.00
Current Frame 35 of 100 (65 left )
Time Per Frame 16 mins, 28 sec
Time Left 17 hours, 50 mins
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Originally posted by: osage
yes some real beasts, been getting them for a week. 5 out of 6 rigs have them right now.

sample from a 2200 mhz Barton rig

Protein p266_ch3cn
Protein Core Gromacs
Credit 152.00
Deadline 30.00
Current Frame 58 of 100 (42 left )
Time Per Frame 17 mins, 26 sec
Time Left 12 hours, 12 mins


another one from a 2200 mhz XP rig

Protein p268_ch3oh
Protein Core Gromacs
Credit 206.00
Deadline 39.00
Current Frame 70 of 100 (30 left )
Time Per Frame 25 mins, 29 sec
Time Left 12 hours, 44 mins


and one more from another 2200 mhz Barton rig

Protein p262_ch3cn
Protein Core Gromacs
Credit 152.00
Deadline 30.00
Current Frame 35 of 100 (65 left )
Time Per Frame 16 mins, 28 sec
Time Left 17 hours, 50 mins


What are you feeding these rigs? Cheerios, high octane gas? Your times are what I normally get for the 40-60 point WU.

How fast can your rigs process the smaller credit WU?
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Ok. This is uncool. After crunching away for 1.5 days on the behemoth, my cruncher froze up. When it rebooted it was unable to restart from the last checkpoint..so lost all the work it had done in that time, because it had to start from scratch. Dang it.
 

ProviaFan

Lifer
Mar 17, 2001
14,993
1
0
Well that bites. Indeed, I am very sorry. :(

Not to be left out, I have had my share of those kinds of problems. But most of those occurred back in the day when I was running F@H on a K6-2 400MHz, and a 20 point WU took as long as today's 200 point ones.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: trevinom
Ok. This is uncool. After crunching away for 1.5 days on the behemoth, my cruncher froze up. When it rebooted it was unable to restart from the last checkpoint..so lost all the work it had done in that time, because it had to start from scratch. Dang it.

Ouch!

Do you use the -forceasm flag?

This causes the core to NOT check prior termination. I've read that it can help when restarting from a crash. If it crashed while writing to disk, probably no hope then.

:)
 

ProviaFan

Lifer
Mar 17, 2001
14,993
1
0
Originally posted by: GLeeM
Do you use the -forceasm flag?

This causes the core to NOT check prior termination. I've read that it can help when restarting from a crash. If it crashed while writing to disk, probably no hope then.
The way I understand -forceasm, is that it causes the core to use optimizations always, even if there was a crash before. The default behavior is to disable optimizations if there is a crash. I use -forcesse (which implies -forceasm) simply because in my experience SSE is faster than 3DNow on my Athlon XP, or at least it was back when I bothered to test it many months ago.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: jliechty
Originally posted by: GLeeM
Do you use the -forceasm flag?

This causes the core to NOT check prior termination. I've read that it can help when restarting from a crash. If it crashed while writing to disk, probably no hope then.
The way I understand -forceasm, is that it causes the core to use optimizations always, even if there was a crash before. The default behavior is to disable optimizations if there is a crash. I use -forcesse (which implies -forceasm) simply because in my experience SSE is faster than 3DNow on my Athlon XP, or at least it was back when I bothered to test it many months ago.

Yes, so we agree!

"it causes the core to use optimizations always, even if there was a crash before" by NOT checking prior termination!

Also, -forcesse will not be supported in v5. I guess -forceasm will do what -forcesse does now.
 

ProviaFan

Lifer
Mar 17, 2001
14,993
1
0
Originally posted by: GLeeM
Yes, so we agree!

"it causes the core to use optimizations always, even if there was a crash before" by NOT checking prior termination!

Also, -forcesse will not be supported in v5. I guess -forceasm will do what -forcesse does now.
Ok, ok, sorry... I must have had a "momentary lapse of reason" and thought you said something else. :eek:

Yes, we do agree. :)
 

3chordcharlie

Diamond Member
Mar 30, 2004
9,859
1
81
One of my systems just killed one of those 231-point WUs at 92% due to a disk-write error when I shut the program down to reboot. There's 36hours I'll never get back.

DAMMIT!!!!!

Can I increase the length of the log file? I think I've lost one or two even without reboots in the last week, but they always seem to be outside the scope of the log file by the time I try to check.
 

trevinom

Golden Member
Sep 19, 2003
1,061
0
0
Originally posted by: 3chordcharlie
One of my systems just killed one of those 231-point WUs at 92% due to a disk-write error when I shut the program down to reboot. There's 36hours I'll never get back.

DAMMIT!!!!!

Can I increase the length of the log file? I think I've lost one or two even without reboots in the last week, but they always seem to be outside the scope of the log file by the time I try to check.

I feel your pain, 3chordcharlie. That is the exact situation I was in and I know how much it hurts. Hang in there.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
I'm not sure I'm onboard with the idea that the different units produce similar PPDs....
these two machines are hardware identical.... both crunching only for the last 6 hours... (edit: that means I haven't been using them for anything else during that time....)


fah4console.exe
Protein p1118_L939_K12M_503K_170V
Protein Core Tinker
Credit 231.00
Deadline 44.00
Current Frame 327 of 400 (73 left )
Time Per Frame 4 mins, 44 sec
Time Left 5 hours, 45 mins
Client Version 4.00
Core Version Version 3.8 October 2000
User Name Insidious
Team Number Team 198
Est. PPD 175.69 (0.76 WUs)
Est. PPW 1,229.83 (5.32 WUs)
Uploaded Projects 553


fah4console.exe
Protein p263_chcl3
Protein Core Gromacs
Credit 221.00
Deadline 42.00
Current Frame 72 of 100 (28 left )
Time Per Frame 37 mins, 55 sec
Time Left 17 hours, 41 mins
Client Version 4.00
Core Version Version 1.65 (May 6, 2004) SSE Enabled
User Name Insidious
Team Number Team 198
Est. PPD 83.93 (0.38 WUs)
Est. PPW 587.52 (2.66 WUs)
Uploaded Projects 418
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Edit: Oops, Insidious got in before me. This should have been before his.

Before rebooting, shut the console client down with control + c, this will save a checkpoint and close the client nicely.

If the FAHlog.txt file is over 50KB at restart of client it is saved as FAHlog-Prev.txt and a new FAHlog.txt file is created.

If a WU early ends, what is there is sent in and you get credit for what is done!

:)
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: Insidious
I'm not sure I'm onboard with the idea that the different units produce similar PPDs....
these two machines are hardware identical.... both crunching only for the last 6 hours... (edit: that means I haven't been using them for anything else during that time....)


fah4console.exe
Protein p1118_L939_K12M_503K_170V
Protein Core Tinker
Credit 231.00
Deadline 44.00
Current Frame 327 of 400 (73 left )
Time Per Frame 4 mins, 44 sec
Time Left 5 hours, 45 mins
Client Version 4.00
Core Version Version 3.8 October 2000
User Name Insidious
Team Number Team 198
Est. PPD 175.69 (0.76 WUs)
Est. PPW 1,229.83 (5.32 WUs)
Uploaded Projects 553


fah4console.exe
Protein p263_chcl3
Protein Core Gromacs
Credit 221.00
Deadline 42.00
Current Frame 72 of 100 (28 left )
Time Per Frame 37 mins, 55 sec
Time Left 17 hours, 41 mins
Client Version 4.00
Core Version Version 1.65 (May 6, 2004) SSE Enabled
User Name Insidious
Team Number Team 198
Est. PPD 83.93 (0.38 WUs)
Est. PPW 587.52 (2.66 WUs)
Uploaded Projects 418

This seems like a rather large difference!

Is the Gromacs unit using optimizations? Does it say "Extra SSE boost OK." in the log file?

Is anyone else getting poor PPD on p263?

I see that server is for FaH clients, if you add the -advmethods flag you may have a better chance of not being assigned to that server.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
I was kind of surprised too! Here is the beginning of the logfile for the slow machine:

--- Opening Log file [July 31 13:15:24]


# Windows Console Edition #####################################################
###############################################################################

Folding@home Client Version 4.00

http://folding.stanford.edu

###############################################################################
###############################################################################

Arguments: -forceasm -verbosity 9 -forceSSE

Warning:
By using the -forceSSE flag, you are overriding program
safeguards that monitor the stability of SSE
instructions on your system. If you did not intend
to do this, please restart the program without
-forceSSE. If work units are not completing fully,
then please discontinue use of the flag.

Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.

[13:15:24] - Ask before connecting: No
[13:15:24] - User name: Insidious (Team 198)
[13:15:24] - User ID = xxxxxxxxxxxxxxx
[13:15:24] - Machine ID: 1
[13:15:24]
[13:15:24] Loaded queue successfully.
[13:15:24] + Benchmarking ...
[13:15:28] The benchmark result is 5088
[13:15:28]
[13:15:28] - Autosending finished units...
[13:15:28] Trying to send all finished work units
[13:15:28] + No unsent completed units remaining.
[13:15:28] - Autosend completed
[13:15:28] + Processing work unit
[13:15:28] Core required: FahCore_78.exe
[13:15:28] Core found.
[13:15:28] Working on Unit 03 [July 31 13:15:28]
[13:15:28] + Working ...
[13:15:28] - Calling 'FahCore_78.exe -dir work/ -suffix 03 -checkpoint 15 -forceasm -forceSSE -verbose -lifeline 496 -version 400'

[13:15:32]
[13:15:32] *------------------------------*
[13:15:32] Folding@home Gromacs Core
[13:15:32] Version 1.65 (May 6, 2004)
[13:15:32]
[13:15:32] Preparing to commence simulation
[13:15:32] - Assembly optimizations manually forced on.
[13:15:32] - Not checking prior termination.
[13:15:33] - Expanded 461853 -> 2328497 (decompressed 504.1 percent)
[13:15:34]
[13:15:34] Project: 736 (Run 9, Clone 50, Gen 8)
[13:15:34]
[13:15:34] Assembly optimizations on if available.
[13:15:34] Entering M.D.
[13:15:55] (Starting from checkpoint)
[13:15:55] Protein: p736_Protein
[13:15:55]
[13:15:55] Writing local files
[13:15:57] Completed 648610 out of 1000000 steps (64)