The 3rd Folding@Home Holiday Season Race in December 2008

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MaskedAvenger

Member
Jul 31, 2001
138
11
76
Originally posted by: VirtualLarry
Originally posted by: MaskedAvenger
Also, the 2nd card won't work unless it is hooked up to a monitor.
Anyone know how to fool Windows into thinking a monitor is there if it isn't. I'm thinking of using a spare KVM cord. Besides that, anyone have any ideas?
Look at my quad-GPU rig thread, and google for "VGA dummy".

VirtualLarry, thanks for the info. I went to Fry's, bought a pack of 68 ohm 1/2watt resistors (the closest thay had to 75 ohm without going over). I stuck in the resistors into a DVI-vga converter without soldering them (it was a tight fit so I figure the connection should be fine) in the specified order. I then stuck the modified converter into my second card and ureka! My monitor came to life! :D The 2nd GPU client seemed to have stopped working (it was at 100% but had not been sent). After rebooting my PC, both clients are workiung again. A big :thumbsup: to you, VirtualLarry!
 

biodoc

Diamond Member
Dec 29, 2005
6,346
2,243
136
The folders have expanded their lead by 112459 points.

Nice job Folders!:beer:

Good job TAS & the Folders for tweaking their hardware for every available point!:beer::thumbsup:
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
This is a 2 part post.

The first part is being written from the machine that is not working well. I will show you something from the FAHLOG. After I do that, I will go to the machine that is not having any problems, edit this post, and copy in the same part of the FAHLOG so that we can compare the 2. Concerning the machine that is giving trouble: earlier today I uninstalled and deleted everything that had anything to do with FAH; then I went through the registry and deleted every FAH item I could find.

Here's part of the FAHLOG from the machine that is giving trouble.

[22:26:45] Initial: D2E9; + 789667 bytes downloaded
[22:26:45] Verifying core Core_a1.fah...
[22:26:45] Signature is VALID
[22:26:45]
[22:26:45] Trying to unzip core FahCore_a1.exe
[22:26:45] Decompressed FahCore_a1.exe (2035712 bytes) successfully
[22:26:50] + Core successfully engaged
[22:26:55]
[22:26:55] + Processing work unit
[22:26:55] Work type a1 not eligible for variable processors
[22:26:55] Core required: FahCore_a1.exe
[22:26:55] Core found.
[22:26:55] Using generic mpiexec calls
[22:26:55] Working on queue slot 01 [December 27 22:26:55 UTC]
[22:26:55] + Working ...
[22:26:55] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -priority 96 -cpu 98 -checkpoint 15 -verbose -lifeline 2360 -version 622'

[22:26:56]
[22:26:56] *------------------------------*
[22:26:56] Folding@Home Gromacs SMP Core
[22:26:56] Version 1.74 (March 10, 2007)
[22:26:56]
[22:26:56] Preparing to commence simulation
[22:26:56] - Ensuring status. Please wait.
[22:26:56] Created dyn
[22:26:56] - Files status OK
[22:27:13] - Working with standard loops on this execution.
[22:27:13] - Previous termination of core was improper.
[22:27:13] - Files status OK
[22:27:13] ndard loops.
[22:27:13] - Files status OK
[22:27:24] - Expanded 4820657 -> 24810145 (decompressed 514.6 percent)
[22:27:24] - Starting from initial work packet
[22:27:24]
[22:27:24] Project: 2665 (Run 2, Clone 405, Gen 80)
[22:27:24]
[22:27:36] Entering M.D.
[22:27:43] Rejecting checkpoint
[22:27:50] cosylations
[22:27:50] Writing local files
[22:27:51]
[22:27:51] Writing local files
[22:28:18] Extra SSE boost OK.
[22:28:20] Writing local files
[22:28:20] Completed 0 out of 250000 steps (0 percent)
[22:43:22] Timered checkpoint triggered.
[22:58:27] Timered checkpoint triggered.
[23:13:28] Timered checkpoint triggered.
[23:28:30] Timered checkpoint triggered.
[23:43:33] Timered checkpoint triggered.
[23:58:34] Timered checkpoint triggered.
[00:13:35] Timered checkpoint triggered.
[00:17:52] Writing local files
[00:17:53] Completed 2500 out of 250000 steps (1 percent)
[00:32:54] Timered checkpoint triggered.
[00:47:57] Timered checkpoint triggered.
[01:02:58] Timered checkpoint triggered.
[01:18:01] Timered checkpoint triggered.
[01:33:03] Timered checkpoint triggered.

See -- very slow.

:(

Edit: 12/27/2008 ~2000 pst - from the machine that is not having a problem.

Here is part of the FAHLOG.

20:59:10] Initial: D2E9; + 789667 bytes downloaded
[20:59:10] Verifying core Core_a1.fah...
[20:59:10] Signature is VALID
[20:59:10]
[20:59:10] Trying to unzip core FahCore_a1.exe
[20:59:10] Decompressed FahCore_a1.exe (2035712 bytes) successfully
[20:59:15] + Core successfully engaged
[20:59:20]
[20:59:20] + Processing work unit
[20:59:20] Work type a1 not eligible for variable processors
[20:59:20] Core required: FahCore_a1.exe
[20:59:20] Core found.
[20:59:20] Using generic mpiexec calls
[20:59:20] Working on queue slot 01 [December 26 20:59:20 UTC]
[20:59:20] + Working ...
[20:59:20] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -priority 96 -cpu 98 -checkpoint 15 -verbose -lifeline 1896 -version 622'

[20:59:20]
[20:59:20] *------------------------------*
[20:59:20] Folding@Home Gromacs SMP Core
[20:59:20] Version 1.74 (March 10, 2007)
[20:59:20]
[20:59:20] Preparing to commence simulation
[20:59:20] - Ensuring status. Please wait.
[20:59:23] - Starting from initial work packet
[20:59:23]
[20:59:23] Project: 2653 (Run 30, Clone 60, Gen 98)
[20:59:23]
[20:59:23] Assembly optimizations on if available.
[20:59:23] Entering M.D.
[20:59:41] al work packet
[20:59:41]
[20:59:41] Project: 2653 (Run 30, Clone 60, Gen 98)
[20:59:41]
[20:59:43] 53 (Run 30, Clone 60, Gen 98)
[20:59:43]
[20:59:44] Entering M.D.
[20:59:50] Rejecting checkpoint
[20:59:51] Protein: Protein in POPC
[20:59:51] Writing local files
[20:59:53] Extra SSE boost OK.
[20:59:53] Writing local files
[20:59:53] Completed 0 out of 500000 steps (0 percent)
[21:14:53] Timered checkpoint triggered.
[21:24:00] Writing local files
[21:24:01] Completed 5000 out of 500000 steps (1 percent)
[21:39:00] Timered checkpoint triggered.
[21:47:55] Writing local files
[21:47:55] Completed 10000 out of 500000 steps (2 percent)
[22:02:54] Timered checkpoint triggered.
[22:11:45] Writing local files
[22:11:46] Completed 15000 out of 500000 steps (3 percent)
[22:26:45] Timered checkpoint triggered.
[22:35:34] Writing local files
[22:35:34] Completed 20000 out of 500000 steps (4 percent)
[22:50:34] Timered checkpoint triggered.
[22:59:21] Writing local files
[22:59:21] Completed 25000 out of 500000 steps (5 percent)
[23:14:21] Timered checkpoint triggered.
[23:23:18] Writing local files
[23:23:18] Completed 30000 out of 500000 steps (6 percent)

Looking at these, it is difficult to see a lot of difference between the 2. The project is different. Oddly enough, on the problem machine, the project is the same as it was befoe I cleaned everything out and reinstalled and reconfigured. The good machine runs a lot faster than the other machine.

I'm going to close this edit so that I can look at the whole post.

2nd edit: 12/27/2008 ~2025 pst

The 2 logs differ. If we eximine the parts after the core starts in each log ( starting with "Folding@Home Gromacs SMP Core"), on the good machine it says, "...Assembly optimizations on if available."; Then it says, "Entering M.D.". On the bad machine it never says anything about assembly optimizations, and it says, "Previous termination of core was improper." How could that be? I cleaned everything out. There was no core there --- unless there are things in the registry that I don't know how to find.

Wait!!! News flash!! Rudy Toody just called me to say that someone has responded to this post with a good suggestion about a -forceasm switch or something.

edit terminated.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
The first fahlog.txt files mentions using standard loops because the prior shutdown was improper.
[22:26:56] - Files status OK
[22:27:13] - Working with standard loops on this execution.
[22:27:13] - Previous termination of core was improper.

[22:27:13] - Files status OK

To force the client to use the enhanced instruction set (much faster) add the flag -forceasm to the configuration. I believe that will solve the problem.

You see in the 'working machine' log:

[20:59:23] Project: 2653 (Run 30, Clone 60, Gen 98)
[20:59:23]
[20:59:23] Assembly optimizations on if available.
[20:59:23] Entering M.D.

next is to figure out how it was shutdown improperly. I suggest only shutting it down by doing a normal windows re-start with it running.

I suspect you used task manager to kill it and that pissed it off.

-Sid

edit: The -forceasm flag will let it ignore how it was shutdown previously and always use the assembly optimizations
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
Thanks Sid.

I think that I owe you a beer -- or a whole case -- or maybe you don't drink.

Anyway I restarted the computer, ran -configonly and put the -forceasm switch in there and restarted SMP.

It feels like it worked, and I owe you. I'll know for sure in a few minutes.

Many thanks.

:cool:
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
good news!

I hope it turns out to fix it. (check the fahlog.txt file)

/me grabs a :beer: early

;)

-Sid
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
# Windows SMP Console Edition #################################################
###############################################################################

Folding@Home Client Version 6.22 SMP Beta2

http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Distributed Computing\FoldingAtHome\FAH6.22SMP-mpich
Executable: C:\Distributed Computing\FoldingAtHome\FAH6.22SMP-mpich\SMP6_22.exe
Arguments: -smp -verbosity 9 -forceasm

Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.

[04:51:04] - Ask before connecting: No
[04:51:04] - User name: rabrittain (Team 198)
[04:51:04] - User ID: F373BA13960AEAA
[04:51:04] - Machine ID: 1
[04:51:04]
[04:51:04] Loaded queue successfully.
[04:51:04]
[04:51:04] + Processing work unit
[04:51:04] - Autosending finished units... [December 28 04:51:04 UTC][04:51:04] Work type a1 not eligible for variable processors

[04:51:04] Trying to send all finished work units
[04:51:04] Core required: FahCore_a1.exe
[04:51:04] + No unsent completed units remaining.
[04:51:04] - Autosend completed
[04:51:04] Core found.
[04:51:04] Using generic mpiexec calls
[04:51:04] Working on queue slot 01 [December 28 04:51:04 UTC]
[04:51:04] + Working ...
[04:51:04] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 01 -priority 96 -cpu 98 -checkpoint 15 -forceasm -verbose -lifeline 2236 -version 622'

[04:51:05]
[04:51:05] *------------------------------*
[04:51:05] Folding@Home Gromacs SMP Core
[04:51:05] Version 1.74 (March 10, 2007)
[04:51:05]
[04:51:05] Preparing to commence simulation
[04:51:05] - Ensuring status. Please wait.
[04:51:22] - Assembly optimizations manually forced on.
[04:51:22] - Not checking prior termination.
[04:51:44] - Expanded 4820657 -> 24810145 (decompressed 514.6 percent)
[04:51:47]
[04:51:47] Project: 2665 (Run 2, Clone 405, Gen 80)
[04:51:47]
[04:51:55] Assembly optimizations on if available.
[04:51:55] Entering M.D.
[04:52:02] Calling FAH init
[04:52:09] Read topology
[04:52:09] s
[04:52:09] tarting from checkpoint)
[04:52:09] Read checkpoint
[04:52:10] 250000 steps (3 percent)
[04:52:10] tions
[04:52:10] Writing local files
[04:52:10] Completed 8186 out of 250000 steps (3 percent)
[04:52:37] Extra SSE boost OK.
[05:07:13] Timered checkpoint triggered.
[05:22:14] Timered checkpoint triggered.
[05:37:16] Timered checkpoint triggered.
[05:52:18] Timered checkpoint triggered.
[06:07:25] Timered checkpoint triggered.
[06:12:16] Writing local files
[06:12:17] Completed 10000 out of 250000 steps (4 percent)

I'm not sure this problem is fixed. It just took 72 minutes to go from 3 to 4%.

Maybe it's time for me to consider deino.
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
[04:52:09] Read topology
[04:52:09] s
[04:52:09] tarting from checkpoint)
[04:52:09] Read checkpoint
[04:52:10] 250000 steps (3 percent)
[04:52:10] tions
[04:52:10] Writing local files
[04:52:10] Completed 8186 out of 250000 steps (3 percent)
[04:52:37] Extra SSE boost OK.

If you look at the logfile then someting is missing during the timepoints [04:52:09] - [04:52:10] (look at the bold stuff).
I have checked my logfiles for the past days and have not found any such deviations. May be a hardware problem with the HDD or something else?

Just my 2 cents - and please remember: I am guessing wildly ...!
 

biodoc

Diamond Member
Dec 29, 2005
6,346
2,243
136
Me thinks Sid added some hardware recently!;)

MarkFW, did you ever get those new cards?

Did anyone else add new hardware during the race???

I did notice MaskAvenger's production nearly doubled recently!:shocked:

Also, the TeAm's production went from about 480,000 ppd in late November to about 650,000 ppd in the last few days!!!:shocked::shocked:

Awesome crunching everyone!!:beer::thumbsup:
 

Drsignguy

Platinum Member
Mar 24, 2002
2,264
0
76
rabrittain:)
Let's go back to some basics. 1st, of the 3 rigs you have, 2 of them are AMD 64 fx-60 and 1 is an Intel Pentium D 820. Both of which are 64 bit CPu's. What Rig ( the name in your sig ) are you having this problem on?

This is where I am going to start...:thumbsup:
Originally posted by: rabrittain

Masked Avenger -- That's a good suggestion, and I'm giving it much consideration. The only reason that I'm using mpich is that I'm running XP x64 on the other machine, and I must use mpich for it. I used mpich on the 32 bit machine because I did not want to have to learn about another client. I know - that may not be a good enough reason.


Seems to me that the client works. It is starting to look like a hardware issue.
Is the Rig that is having the problems over clocked? Once we narrow it down to what Rig and what you have done with it, we should be able to solve this delema...:)


Originally posted by: rabrittain
[04:51:47] Project: 2665 (Run 2, Clone 405, Gen 80)
[04:51:47]
[04:51:55] Assembly optimizations on if available.
[04:51:55] Entering M.D.
[04:52:02] Calling FAH init
[04:52:09] Read topology
[04:52:09] s
[04:52:09] tarting from checkpoint)
[04:52:09] Read checkpoint
[04:52:10] 250000 steps (3 percent)
[04:52:10] tions
[04:52:10] Writing local files
[04:52:10] Completed 8186 out of 250000 steps (3 percent)
[04:52:37] Extra SSE boost OK.
[05:07:13] Timered checkpoint triggered.
[05:22:14] Timered checkpoint triggered.
[05:37:16] Timered checkpoint triggered.
[05:52:18] Timered checkpoint triggered.
[06:07:25] Timered checkpoint triggered.
[06:12:16] Writing local files
[06:12:17] Completed 10000 out of 250000 steps (4 percent)

I'm not sure this problem is fixed. It just took 72 minutes to go from 3 to 4%.


I have seen this kind of thing before and it had to do with an unstable machine....
Even though your Rig may seem to be running just fine fishing the net or e-mails doesn't mean its ok. your Rig just might need a voltage bump or a FSB drop, depending on what your clock is set at. If your rig is Over clocked, then you might have to re-test and if not an over clock, take a look at your voltage settings and bump it up a notch. Either way I would test just to make sure.

Seriously, I have had this happen a couple of times and I just tested it earlier today and it confirms my what my thoughts were.

I really hope this helps you and anyone else that has/had this happen.....:)









EDIT: did some testing and added more info....;)

 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Originally posted by: petrusbroder
[04:52:09] Read topology
[04:52:09] s
[04:52:09] tarting from checkpoint)
[04:52:09] Read checkpoint
[04:52:10] 250000 steps (3 percent)
[04:52:10] tions
[04:52:10] Writing local files
[04:52:10] Completed 8186 out of 250000 steps (3 percent)
[04:52:37] Extra SSE boost OK.

If you look at the logfile then someting is missing during the timepoints [04:52:09] - [04:52:10] (look at the bold stuff).
I have checked my logfiles for the past days and have not found any such deviations. May be a hardware problem with the HDD or something else?

Just my 2 cents - and please remember: I am guessing wildly ...!


Peter, Those are pretty normal on the SMP logfiles. You'll normally find a couple of them when the client is started and sometimes when a new WU begins. (I have several showing in my logs too)

This really just looks like a case of not enough horsepower to get the SMP up to speed along with the GPU client.

I think Drsignguy is right that it may be due to the different operating systems. One of them just doesn't leave enough CPU for the SMP to work with effectively.

but, rabrattain I don't think changing to the Deino client is going to help. It isn't noted to be any faster. If you were having lots of EUEs, then it might be a good thing to try. (it's claimed to be more stable, but speed is the same)

and finally.... so you've been keepin' an eye on me huh biodoc! ;)
No new hardware here, just a lot of patience eliminating EUEs (sometimes you have to go slower to go faster :) )and those tasty new GPU work units! :)

edit; Actually, I'm getting this PPD with one LESS computer. (My son asked me to sell his Opty 170 PC, so I did about a week or so into the race. Cost me a GPU and a SMP client)

-Sid


 

MaskedAvenger

Member
Jul 31, 2001
138
11
76
Originally posted by: biodoc
Me thinks Sid added some hardware recently!;)

MarkFW, did you ever get those new cards?

Did anyone else add new hardware during the race???

I did notice MaskAvenger's production nearly doubled recently!:shocked:

Also, the TeAm's production went from about 480,000 ppd in late November to about 650,000 ppd in the last few days!!!:shocked::shocked:

Awesome crunching everyone!!:beer::thumbsup:

Since I joined the race, I added 2nd GPU's to two systems, added a GPU client to a system that should have had one running, and mindly overclocked other GPU's. I still need to replace 2 duo systems with quads, and replace an older GPU with a newer spare one that had been taken out of a dual GPU system. Also, before I joined the race, I had never used FaHMon so I did not have a good idea of what systems were contributing what. Lastly, since I've had more time (I've been off work since the 19th) I've been able to tweak them to get the most out of them.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,374
16,216
136
The one card was lost by UPS, and I will be getting a refund, as it was recertified. The other Fedex has somewhere. The slowest overnight service I have ever seen ! If I get it monday, it will be 5 days !
 

Drsignguy

Platinum Member
Mar 24, 2002
2,264
0
76
Oh how I feel your pains Mark! But due to the holiday rush, I am just betting that was the case, as I too am waiting....My cards won't be here until the 2nd...So, I won't worry to much as I know they will get here. :)

On the other hand, I have never heard of anything getting "lost". that seems very strange..Sorry to hear about that.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,374
16,216
136
In case you are interested, here is the tracking number for UPS:1Z W09 8R4 03 3292 057 1

And billing information received for over a week ? right.... Newegg and I think they lost it.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Once I ordered two motherboards. One got to me within three days. The other one, who knows what happened to it, it took an additional week to show up. Newegg was going to give me credit for it, but lo and behold, the next day it showed up at my door.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,374
16,216
136
Originally posted by: Insidious
What did UPS say when you called and spoke to them?

-Sid

UPS ?? They don't answer the phone for people like us. Newegg, maybe.
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
... a big thankyou to petrusbroder, Drsignguy, and Insidious for continuing to try to help me!!

I apologize for not getting back to you sooner today; I was spending time with my family. My son had an AAU basketball tournament.

I have been running FAH on the 2 FX-60 machines since the beggining of the month. I have been running Rosetta@Home on the Pentium D machine. Smoke gave me the mobo and the cpu for that machine, and I know that the project is important to him.

So ... I have been running SMP on both BigDog000 and BadDog002 since the beggining of the month without a problem -- until this past week when I thought that I could take the video card from Dog001 (the intel machine), put it on BadDog002 (XP 32 bit), install a GPU client, run that with SMP and raise my ppd.

The machine is not overclocked. The cpu has plenty of horsepower for the video card; after I installed it in BadDog002, I ran Half Life 2 (a pretty intensive video game) for a little while without any problem.

The GPU client ran fine. When I tried to run other FAH clients (SMP and then, a uniprocessor client) with the GPU client, they were very slow. My ppd for that machine was lower than what it had been with SMP alone. So I decided to uninstall everything, re-install SMP and bring my ppd back up to what it had been before all the changes. Since then I have cleaned out SMP and reinstalled and reconfigured it twice. It's still slow.

One thing I haven't tried is uninstalling and reinstalling the video drivers.

I'm not sure, but I think the problem is related to remnants of the GPU client, the video card, or both. When I installed the video card, it automatically put .Net Framework 2.0 on the computer. Maybe I need to install updates to that. The problems started after I installed the video card and the gpu client. Hmmmm - maybe it is the video card.

It's late. I have taken my zaprasidaone, and my brain is about to shut down.

I will jump into this problem with both feet in the morning.
 

biodoc

Diamond Member
Dec 29, 2005
6,346
2,243
136
Originally posted by: Markfw900
In case you are interested, here is the tracking number for UPS:1Z W09 8R4 03 3292 057 1

And billing information received for over a week ? right.... Newegg and I think they lost it.

Tracking Number: 1Z W09 8R4 03 3292 057 1
Type: Package
Status: In Transit - On Time

Scheduled Delivery: 12/29/2008
Shipped To: BEAVERTON, OR, US
Shipped/Billed On: 12/22/2008
Service: GROUND
Weight: 4.00 Lbs

PORTLAND,
OR, US 12/28/2008 10:21 P.M. DEPARTURE SCAN

Looks like you may finally get that card today MarkFW!!:shocked::shocked:
 

Drsignguy

Platinum Member
Mar 24, 2002
2,264
0
76
rabrittain

Thanks for the reply and it good that you had spent time with your family. Hope everything went well. :)

Ok, seems to me that adding the video card changed everything....as an example, 1 on my Q6600 rigs had the same exact problem when I went from an Ati 3450 to an Ati 3870. My machine started to act funny. It was F@H that really showed the problem when it had slowed way down to the point of not working at all. The timered checkpoints between the % points was a dead giveaway. Even though you are not overclocking and adding the new vid card doesn't mean your system is stable.

I went back and checked my Over Clock and sure enough, it failed. Only to the point of failing the "Intel Burn Test", not by rebooting or BSOD's. Took about 20 minutes to fix. All I had to do was up the Vcore on my chip a notch.

Best for you to double check your systems voltage to the Cpu. My bet is that the voltage draw from your vid card did some changing to your system, even though you haven't run the Gpu client. And by running the Gpu client, it becomes worse..The GPU client worked, the Smp client didn't, which relates back to the CPU and it's voltage. :)

 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
rabrittain, OK, this is driving me crazy. It seems like you are doing all the right things, but the result just won't come.

I may have overlooked it in your posts, but if you haven't already tried it, what happens if you only run the SMP client? What if you only run the GPU client? Maybe by doing a run of each by itself, that will help identify where this bottleneck is.

-Sid
 

biodoc

Diamond Member
Dec 29, 2005
6,346
2,243
136
Originally posted by: Insidious
rabrittain, OK, this is driving me crazy. It seems like you are doing all the right things, but the result just won't come.

I may have overlooked it in your posts, but if you haven't already tried it, what happens if you only run the SMP client? What if you only run the GPU client? Maybe by doing a run of each by itself, that will help identify where this bottleneck is.

-Sid

Good idea Sid. Probably good to start over and remove all FAH programs by uninstalling via the Windows control panel. When I ran windows, I would clean the registry too. A free program called CCleaner works pretty well. Here's a link

:beer:
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
[19:36:23] Folding@home Core Shutdown: FINISHED_UNIT
[19:36:26] CoreStatus = 64 (100)
[19:36:26] Sending work to server
[19:36:26] Project: 5766 (Run 12, Clone 245, Gen 0)
[19:36:26] - Read packet limit of 540015616... Set to 524286976.


[19:36:26] + Attempting to send results [December 28 19:36:26 UTC]
[19:36:28] + Results successfully sent
[19:36:28] Thank you for your contribution to Folding@Home.
[19:36:28] + Number of Units Completed: 232

[19:36:32] - Preparing to get new work unit...
[19:36:32] + Attempting to get work packet
[19:36:32] - Connecting to assignment server
[19:36:33] - Successful: assigned to (171.67.108.11).
[19:36:33] + News From Folding@Home: GPU folding beta
[19:36:33] Loaded queue successfully.
[19:36:34] + Closed connections
[19:36:34]
[19:36:34] + Processing work unit
[19:36:34] Core required: FahCore_11.exe
[19:36:34] Core found.
[19:36:34] Working on queue slot 09 [December 28 19:36:34 UTC]
[19:36:34] + Working ...
[19:36:34]
[19:36:34] *------------------------------*
[19:36:34] Folding@Home GPU Core - Beta
[19:36:34] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:36:34]
[19:36:34] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:36:34] Build host: amoeba
[19:36:34] Board Type: Nvidia
[19:36:34] Core :
[19:36:34] Preparing to commence simulation
[19:36:34] - Looking at optimizations...
[19:36:34] - Created dyn
[19:36:34] - Files status OK
[19:36:34] - Expanded 98610 -> 492276 (decompressed 499.2 percent)
[19:36:34] Called DecompressByteArray: compressed_data_size=98610 data_size=492276, decompressed_data_size=492276 diff=0
[19:36:34] - Digital signature verified
[19:36:34]
[19:36:34] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:36:34]
[19:36:34] Assembly optimizations on if available.
[19:36:34] Entering M.D.
[19:36:41] Working on Protein
[19:36:44] Client config found, loading data.
[19:36:44] Starting GUI Server
[19:36:45] mdrun_gpu returned
[19:36:45] NANs detected on GPU
[19:36:45]
[19:36:45] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:36:48] CoreStatus = 7A (122)
[19:36:48] Sending work to server
[19:36:48] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:36:48] - Read packet limit of 540015616... Set to 524286976.
[19:36:48] - Error: Could not get length of results file work/wuresults_09.dat
[19:36:48] - Error: Could not read unit 09 file. Removing from queue.
[19:36:48] - Preparing to get new work unit...
[19:36:48] + Attempting to get work packet
[19:36:48] - Connecting to assignment server
[19:36:49] - Successful: assigned to (171.67.108.11).
[19:36:49] + News From Folding@Home: GPU folding beta
[19:36:49] Loaded queue successfully.
[19:36:51] + Closed connections
[19:36:56]
[19:36:56] + Processing work unit
[19:36:56] Core required: FahCore_11.exe
[19:36:56] Core found.
[19:36:56] Working on queue slot 00 [December 28 19:36:56 UTC]
[19:36:56] + Working ...
[19:36:56]
[19:36:56] *------------------------------*
[19:36:56] Folding@Home GPU Core - Beta
[19:36:56] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:36:56]
[19:36:56] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:36:56] Build host: amoeba
[19:36:56] Board Type: Nvidia
[19:36:56] Core :
[19:36:56] Preparing to commence simulation
[19:36:56] - Looking at optimizations...
[19:36:56] - Created dyn
[19:36:56] - Files status OK
[19:36:56] - Expanded 98610 -> 492276 (decompressed 499.2 percent)
[19:36:56] Called DecompressByteArray: compressed_data_size=98610 data_size=492276, decompressed_data_size=492276 diff=0
[19:36:56] - Digital signature verified
[19:36:56]
[19:36:56] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:36:56]
[19:36:56] Assembly optimizations on if available.
[19:36:56] Entering M.D.
[19:37:02] Working on Protein
[19:37:06] Client config found, loading data.
[19:37:06] Starting GUI Server
[19:37:06] mdrun_gpu returned
[19:37:06] NANs detected on GPU
[19:37:06]
[19:37:06] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:37:10] CoreStatus = 7A (122)
[19:37:10] Sending work to server
[19:37:10] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:37:10] - Read packet limit of 540015616... Set to 524286976.
[19:37:10] - Error: Could not get length of results file work/wuresults_00.dat
[19:37:10] - Error: Could not read unit 00 file. Removing from queue.
[19:37:10] - Preparing to get new work unit...
[19:37:10] + Attempting to get work packet
[19:37:10] - Connecting to assignment server
[19:37:10] - Successful: assigned to (171.67.108.11).
[19:37:10] + News From Folding@Home: GPU folding beta
[19:37:11] Loaded queue successfully.
[19:37:12] + Closed connections
[19:37:17]
[19:37:17] + Processing work unit
[19:37:17] Core required: FahCore_11.exe
[19:37:17] Core found.
[19:37:17] Working on queue slot 01 [December 28 19:37:17 UTC]
[19:37:17] + Working ...
[19:37:17]
[19:37:17] *------------------------------*
[19:37:17] Folding@Home GPU Core - Beta
[19:37:17] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:37:17]
[19:37:17] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:37:17] Build host: amoeba
[19:37:17] Board Type: Nvidia
[19:37:17] Core :
[19:37:17] Preparing to commence simulation
[19:37:17] - Looking at optimizations...
[19:37:17] - Created dyn
[19:37:17] - Files status OK
[19:37:17] - Expanded 98610 -> 492276 (decompressed 499.2 percent)
[19:37:17] Called DecompressByteArray: compressed_data_size=98610 data_size=492276, decompressed_data_size=492276 diff=0
[19:37:17] - Digital signature verified
[19:37:17]
[19:37:17] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:37:17]
[19:37:17] Assembly optimizations on if available.
[19:37:17] Entering M.D.
[19:37:24] Working on Protein
[19:37:27] Client config found, loading data.
[19:37:28] Starting GUI Server
[19:37:28] mdrun_gpu returned
[19:37:28] NANs detected on GPU
[19:37:28]
[19:37:28] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:37:31] CoreStatus = 7A (122)
[19:37:31] Sending work to server
[19:37:31] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:37:31] - Read packet limit of 540015616... Set to 524286976.
[19:37:31] - Error: Could not get length of results file work/wuresults_01.dat
[19:37:31] - Error: Could not read unit 01 file. Removing from queue.
[19:37:31] - Preparing to get new work unit...
[19:37:31] + Attempting to get work packet
[19:37:31] - Connecting to assignment server
[19:37:32] - Successful: assigned to (171.67.108.11).
[19:37:32] + News From Folding@Home: GPU folding beta
[19:37:32] Loaded queue successfully.
[19:37:33] + Closed connections
[19:37:38]
[19:37:38] + Processing work unit
[19:37:38] Core required: FahCore_11.exe
[19:37:38] Core found.
[19:37:38] Working on queue slot 02 [December 28 19:37:38 UTC]
[19:37:38] + Working ...
[19:37:39]
[19:37:39] *------------------------------*
[19:37:39] Folding@Home GPU Core - Beta
[19:37:39] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:37:39]
[19:37:39] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:37:39] Build host: amoeba
[19:37:39] Board Type: Nvidia
[19:37:39] Core :
[19:37:39] Preparing to commence simulation
[19:37:39] - Looking at optimizations...
[19:37:39] - Created dyn
[19:37:39] - Files status OK
[19:37:39] - Expanded 98610 -> 492276 (decompressed 499.2 percent)
[19:37:39] Called DecompressByteArray: compressed_data_size=98610 data_size=492276, decompressed_data_size=492276 diff=0
[19:37:39] - Digital signature verified
[19:37:39]
[19:37:39] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:37:39]
[19:37:39] Assembly optimizations on if available.
[19:37:39] Entering M.D.
[19:37:45] Working on Protein
[19:37:49] Client config found, loading data.
[19:37:49] Starting GUI Server
[19:37:49] mdrun_gpu returned
[19:37:49] NANs detected on GPU
[19:37:49]
[19:37:49] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:37:53] CoreStatus = 7A (122)
[19:37:53] Sending work to server
[19:37:53] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:37:53] - Read packet limit of 540015616... Set to 524286976.
[19:37:53] - Error: Could not get length of results file work/wuresults_02.dat
[19:37:53] - Error: Could not read unit 02 file. Removing from queue.
[19:37:53] - Preparing to get new work unit...
[19:37:53] + Attempting to get work packet
[19:37:53] - Connecting to assignment server
[19:37:53] - Successful: assigned to (171.67.108.11).
[19:37:53] + News From Folding@Home: GPU folding beta
[19:37:53] Loaded queue successfully.
[19:37:55] + Closed connections
[19:38:00]
[19:38:00] + Processing work unit
[19:38:00] Core required: FahCore_11.exe
[19:38:00] Core found.
[19:38:00] Working on queue slot 03 [December 28 19:38:00 UTC]
[19:38:00] + Working ...
[19:38:00]
[19:38:00] *------------------------------*
[19:38:00] Folding@Home GPU Core - Beta
[19:38:00] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[19:38:00]
[19:38:00] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[19:38:00] Build host: amoeba
[19:38:00] Board Type: Nvidia
[19:38:00] Core :
[19:38:00] Preparing to commence simulation
[19:38:00] - Looking at optimizations...
[19:38:00] - Created dyn
[19:38:00] - Files status OK
[19:38:00] - Expanded 98610 -> 492276 (decompressed 499.2 percent)
[19:38:00] Called DecompressByteArray: compressed_data_size=98610 data_size=492276, decompressed_data_size=492276 diff=0
[19:38:00] - Digital signature verified
[19:38:00]
[19:38:00] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:38:00]
[19:38:00] Assembly optimizations on if available.
[19:38:00] Entering M.D.
[19:38:06] Working on Protein
[19:38:10] Client config found, loading data.
[19:38:10] Starting GUI Server
[19:38:10] mdrun_gpu returned
[19:38:10] NANs detected on GPU
[19:38:10]
[19:38:10] Folding@home Core Shutdown: UNSTABLE_MACHINE
[19:38:14] CoreStatus = 7A (122)
[19:38:14] Sending work to server
[19:38:14] Project: 5751 (Run 6, Clone 154, Gen 8)
[19:38:14] - Read packet limit of 540015616... Set to 524286976.
[19:38:14] - Error: Could not get length of results file work/wuresults_03.dat
[19:38:14] - Error: Could not read unit 03 file. Removing from queue.
[19:38:14] EUE limit exceeded. Pausing 24 hours.
[22:33:25] + Working...
[04:33:25] + Working...
[10:33:25] + Working...
[16:33:25] + Working...


Are 5751 WUs bad, or is there something going on with my system? It finished the previous WU sucessfully, and then the new WU goes NAN, even just starting the WU. What's up with that?
My other two GPUs are still running just fine.