The 3rd Folding@Home Holiday Season Race in December 2008

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Thanks for the race update Peter!

Hope everyone in your household is feeling better. (probably a little soon for that)

What do we have left? 6 days?

I think TAS needs a (post) Christmas miracle. And we were doing soooooo good too. :roll:

The new GPU clients have given me a nice boost, but I guess since everybody gets it.. that's little consolation. :p

Crunch on merry 'pooters :thumbsup:

-Sid

edit: :music: :music: :beer: :beer: :music: :music: (<-- I'm still whistlin' baby! :cool: )

PPS: It's not too late TAS...... one race isn't lost..... YET. Think protiens!
 

Pokey

Platinum Member
Oct 20, 1999
2,782
481
126
Thanks for the stats Peter. :D

I don't know if anyone else has, but I just recently hit a rash of:

NANs detected on GPU
[23:35:06]
[23:35:06] Folding@home Core Shutdown: UNSTABLE_MACHINE
[23:35:09] CoreStatus = 7A (122)
[23:35:09] Sending work to server
[23:35:09] Project: 5768 (Run 1, Clone 23, Gen 12)
[23:35:09] - Read packet limit of 540015616... Set to 524286976.
[23:35:09] - Error: Could not get length of results file work/wuresults_00.dat
[23:35:09] - Error: Could not read unit 00 file. Removing from queue.
[23:35:09] EUE limit exceeded. Pausing 24 hours.

And just as I was ready to dive in, the problem seems to have sorted out somewhat........................ :confused:

The two computers have AMD processors and 9800 GT GPUs and 57xx projects. The problem seems to have a history back to '07 when I try to "google" it. But I haven't stumbled on a sure-fire solution. I guess you just wait it out.

While I am not exactly high fiving anyone yet, I am glad to see the Folders out front now and my gazillion dollar investment in equipment helping. :p :p

But both our teams together must be raising eyebrows over at Stanford. :cool: :thumbsup:


 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Pokey: I've seen similar posts and also have seen this behavior at home. First of all, don't be too quick to blame your equipment or setup. They don't like to acknowledge it, but some of their Work Units..... just don't work right. Fortunately, that is getting to be more of a rarity than in some of our sorted past with this project. :roll:

In the working directory of your GPU client, you will see a folder called 'work'. For GPU clients, when they are working correctly you will only see data files associated with only a single queue slot (unless a result is being saved due to result server not accepting). However, when a server has rejected results, a couple of the files for that queue slot will not be removed even once the result has been successfully sent. Also, when a work unit does EUE (happens to everyone once in a while), the data files for that queue slot are left behind.

What I have seen, is that when these remnants are left in the work folder, the next time that slot is used, the client sometimes confuses the remnants with the data files that would be created for the new WU. Watching the fahlog.txt file closely, you can find times when a WU begins in the middle (some completion % >0 ). This leads to another EUE and gradually, the iinstallation becomes more and more unstable as more and more remnants are left undeleted in the work directory.

The solution that works best for me:

Every once in a while, I look at the contents of the work folder. If I see files for more than one queue slot, I manually delete the ones for any slot except the one that fahlog.txt indicates is the present one being used:

example: (excerpt from fahlog.txt file)
[13:25:53] + Processing work unit
[13:25:53] Core required: FahCore_11.exe
[13:25:53] Core found.
[13:25:53] Working on queue slot 00 [December 24 13:25:53 UTC]
[13:25:53] + Working ...

If I suspect an installation is just totally borked, then I stop the client, delete the 'work' folder in the working directory and also the queue.dat file and the unitinfo.txt file.

Then when I restart the client, it creates new copies of these files and seems to run stabily.... until the next time.

Hope this helps,

-Sid

PS: if you have overclocked your video cards, lowering the clocks is also a good thing to try if they are running hot (>75C) it seems to increase the frequency of EUEs also (even though the card is documented not to be damaged by much higher temperatures)
 

Assimilator1

Elite Member
Nov 4, 1999
24,180
528
126
Pokey
Me too I've just noticed :( :-

[20:12:48] Completed 92%
[20:12:48] mdrun_gpu returned
[20:12:48] NANs detected on GPU
[20:12:48]
[20:12:48] Folding@home Core Shutdown: UNSTABLE_MACHINE
[20:12:51] CoreStatus = 7A (122)
[20:12:51] Sending work to server
[20:12:51] Project: 4744 (Run 4, Clone 490, Gen 33)
[20:12:51] - Read packet limit of 540015616... Set to 524286976.
[20:12:51] - Error: Could not get length of results file work/wuresults_08.dat
[20:12:51] - Error: Could not read unit 08 file. Removing from queue.
[20:12:51] - Preparing to get new work unit...

I'm going to delete unit 08 files going by Sids detailed instructions (thanks Sid :)).
I also had 2 VPU recoveries today & afterwards I was getting some graphical glitching in RTW, time for a reboot!
(I found both my GPU & SMP client had remnant files).

Btw what's a NAN?:confused:
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,374
16,217
136
I had 2 video cards that should have been here by now for a nice 10k boost. UPS lost one, and Fedex can't deliver the other to save their life ! Newegg says I will get a refund of the overnight charge, since it didn;t come today, and god knows when it will show up. What is with shipping lately ?
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
AS1: The SMP clients always leave remnants and they don't seem to hurt anything. It's just the GPU client that they seem to bork things up.

Originally posted by: Markfw900
I had 2 video cards that should have been here by now for a nice 10k boost. UPS lost one, and Fedex can't deliver the other to save their life ! Newegg says I will get a refund of the overnight charge, since it didn;t come today, and god knows when it will show up. What is with shipping lately ?

That would drive me nutz! I get so impatient when I'm waiting for new stuff, then when it gets waylaid.... well, better make sure there is a :beer: in the fridge..... just to keep me sane!

-Sid
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,374
16,217
136
Originally posted by: Insidious
AS1: The SMP clients always leave remnants and they don't seem to hurt anything. It's just the GPU client that they seem to bork things up.

Originally posted by: Markfw900
I had 2 video cards that should have been here by now for a nice 10k boost. UPS lost one, and Fedex can't deliver the other to save their life ! Newegg says I will get a refund of the overnight charge, since it didn;t come today, and god knows when it will show up. What is with shipping lately ?

That would drive me nutz! I get so impatient when I'm waiting for new stuff, then when it gets waylaid.... well, better make sure there is a :beer: in the fridge..... just to keep me sane!

-Sid

The big point is, that TAS is loosing 10k/day due to this. WE have no chance now, unless a miracle happens

Oh, and in case anybody cares, I just broke 14 million
 

MaskedAvenger

Member
Jul 31, 2001
138
11
76
Well, a big thumbs up to you, Markfw900! :thumbsup: :thumbsup: :thumbsup:
Some of us are just not so observant! :eek::)
I thought the race went until the 15th. :confused:
 

Drsignguy

Platinum Member
Mar 24, 2002
2,264
0
76
Originally posted by: Markfw900
Originally posted by: Insidious
AS1: The SMP clients always leave remnants and they don't seem to hurt anything. It's just the GPU client that they seem to bork things up.

Originally posted by: Markfw900
I had 2 video cards that should have been here by now for a nice 10k boost. UPS lost one, and Fedex can't deliver the other to save their life ! Newegg says I will get a refund of the overnight charge, since it didn;t come today, and god knows when it will show up. What is with shipping lately ?

That would drive me nutz! I get so impatient when I'm waiting for new stuff, then when it gets waylaid.... well, better make sure there is a :beer: in the fridge..... just to keep me sane!

-Sid

The big point is, that TAS is loosing 10k/day due to this. WE have no chance now, unless a miracle happens

Oh, and in case anybody cares, I just broke 14 million


Thanks For the stats Peter. Well Done!


Just so you know, I have another GTX 260 and a 8800GTS on it's way... So It still would be close. Only thing is, due to the Holidays, it wont arrive until the 2nd. Oh well.:)

Congrats mark on your 14 million milestone! And a belated Happy Birthday birthday! :beer:Sorry was late but was pretty busy around here.

 

biodoc

Diamond Member
Dec 29, 2005
6,346
2,243
136
Originally posted by: Drsignguy
Originally posted by: Markfw900
Originally posted by: Insidious
AS1: The SMP clients always leave remnants and they don't seem to hurt anything. It's just the GPU client that they seem to bork things up.

Originally posted by: Markfw900
I had 2 video cards that should have been here by now for a nice 10k boost. UPS lost one, and Fedex can't deliver the other to save their life ! Newegg says I will get a refund of the overnight charge, since it didn;t come today, and god knows when it will show up. What is with shipping lately ?

That would drive me nutz! I get so impatient when I'm waiting for new stuff, then when it gets waylaid.... well, better make sure there is a :beer: in the fridge..... just to keep me sane!

-Sid

The big point is, that TAS is loosing 10k/day due to this. WE have no chance now, unless a miracle happens

Oh, and in case anybody cares, I just broke 14 million


Thanks For the stats Peter. Well Done!


Just so you know, I have another GTX 260 and a 8800GTS on it's way... So It still would be close. Only thing is, due to the Holidays, it wont arrive until the 2nd. Oh well.:)

Congrats mark on your 14 million milestone! And a belated Happy Birthday birthday! :beer:Sorry was late but was pretty busy around here.

Bummer about the new cards MarkFW:(

But congrats on the 14 Million big ones!!!:beer::thumbsup:

Drsignguy, looks like you'll be making a big move in the new year!!:beer:

Edit: Happy birthday MarkFW!!:beer::beer: and thanks for the stats Peter!:thumbsup:
 

Pokey

Platinum Member
Oct 20, 1999
2,782
481
126
Thanks Sid.
I cleaned house on these two and things seem to be back to normal.
I have an Intel + 9600GT computer where the multiple queue entries seem to live in harmony. Go figure. :confused: I guess that's why they call it BETA ........... :p
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
Originally posted by: MaskedAvenger
Well, a big thumbs up to you, Markfw900! :thumbsup: :thumbsup: :thumbsup:
Some of us are just not so observant! :eek::)
I thought the race went until the 15th. :confused:

Sorry to say, as it is said in the first post of this thread, the race goes on until ...

January 1, 2009, 13:00 UTC

BTW: TAS has never had as a goal to win these races - at least not as a primary, secondary, tertiary or quartenary goal. The aims and purpose of TAS is:

- Increase TeAm AnandTech's participation and production in projects.
- Introduce people to projects they're no so familiar with.
- Have LOTS of fun!
- Establish TeAm AnandTech as a major player in all that is DC.


In my honest opinion the third goal is the most important ... :D
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Originally posted by: Markfw900

The big point is, that TAS is loosing 10k/day due to this. WE have no chance now, unless a miracle happens

Oh, and in case anybody cares, I just broke 14 million

Congrats on the 14 Million Mark! (how many days does it take you to get another million with that factory you've got going? :thumbsup:)

-Sid



 

rabrittain

Senior member
Dec 28, 2006
715
0
0
Thanks for the stats, petrusbroder!!!

Congratulations and a happy birthday to you, Markfw900!!

I need some real help. In an effort to up my ppd I got rid of SMP and installed a gpu and an x86 client on one of my machines. After all that I found that SMP was better and decided to go back to that. The machine is running really slow -- I won't make deadlines. I uninstalled SMP again, and I went through the registry looking for "fah", and I deleted everything that I found which referenced the x86 or the gpu client. I finally got SMP reinstalled ~0100 pst this morning. It's now 6 hours later and only 3% of the WU has been completed. I am sure that there is still something left over from the gpu of the x86 client (or both) that is slowing things down.

Both of my computers that I have dedicated to FAH are pretty much the same - same mobo, same cpu, same type and amount of ram. The other box has completed 15% in the same amount of time. I realize that the wu's are probably different, but 3% in 6 hours is not acceptable.

Help!!

:(
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
Here is part of the FAHLOG file. I think it shows that I downloaded a new core.

Launch directory: C:\Distributed Computing\FoldingAtHome\FAH6.22beta2SMP-mpich
Executable: C:\Distributed Computing\FoldingAtHome\FAH6.22beta2SMP-mpich\FAHSMP.exe
Arguments: -smp -verbosity 9

[08:40:06] - Ask before connecting: No
[08:40:06] - User name: rabrittain (Team 198)
[08:40:06] - User ID: F373BA13960AEAA
[08:40:06] - Machine ID: 1
[08:40:06]
[08:40:06] Work directory not found. Creating...
[08:40:06] Could not open work queue, generating new queue...
[08:40:06] - Preparing to get new work unit...
[08:40:06] + Attempting to get work packet
[08:40:06] - Will indicate memory of 2783 MB
[08:40:06] - Detect CPU. Vendor: AuthenticAMD, Family: 15, Model: 3, Stepping: 2
[08:40:06] - Connecting to assignment server
[08:40:06] - Autosending finished units... [December 27 08:40:06 UTC]
[08:40:06] Connecting to http://assign.stanford.edu:8080/
[08:40:06] Trying to send all finished work units
[08:40:06] + No unsent completed units remaining.
[08:40:06] - Autosend completed
[08:40:07] Posted data.
[08:40:07] Initial: 40AB; - Successful: assigned to (171.64.65.64).
[08:40:07] + News From Folding@Home: Welcome to Folding@Home
[08:40:07] Loaded queue successfully.
[08:40:07] Connecting to http://171.64.65.64:8080/
[08:40:12] Posted data.
[08:40:12] Initial: 0000; - Receiving payload (expected size: 4821169)
[08:40:18] - Downloaded at ~784 kB/s
[08:40:18] - Averaged speed for that direction ~784 kB/s
[08:40:18] + Received work.
[08:40:18] + Closed connections
[08:40:18]
[08:40:18] + Processing work unit
[08:40:18] Work type a1 not eligible for variable processors
[08:40:18] Core required: FahCore_a1.exe
[08:40:18] Core not found.
[08:40:18] - Core is not present or corrupted.
[08:40:18] - Attempting to download new core...
[08:40:18] + Downloading new core: FahCore_a1.exe
[08:40:18] Downloading core (/~pande/Win32/x86/Core_a1.fah from www.stanford.edu)

So ... it ws good that it downloaded a new care, right?

I'm really stumped.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
First of all.... KUDOS. I have the utmost respect for those who are willing to tweak, borrow and steal for more PPD!

When I remove F@H from a system: (I only know WindowsXP, you will have to translate for Vista)

1. If any client was installed as a service, it must be stopped (START -> Control Panel -> Performance and Maintenance -> Administrative Tools -> Services)
2. Use Add/Remove programs to uninstall everything F@H
3. Delete the installation directory and working directory for each client
4. Use Add/Remove programs to uninstall the NVidia Video drivers
5. Boot

(there shouldn't be any need for registry editing)

To install F@H:

1. If you are planning on running a GPU client, you should use a 18x.xx series CUDA driver from here This will slow your GPU client ~15%, but frees up more CPU for WindowsXP systems with dual cores to allow better SMP operation. (edit: If you are on a quad or only running a GPU client by itself, you want to use 178.24 version drivers for max. GPU production)
2. Boot
3. IF you are running a NVidia GPU client with another client on the CPU, you need to set a windows system environment variable that will allow the GPU client to scavange cycles from any core it likes.
3a. Right-click on 'my computer' and select properties in drop-down menu
3b. select the 'advanced' tab
3c. click the 'Environment Variables' button
3d. in the 'system variables' section click 'new'
3e. type NV_FAH_CPU_AFFINITY in the variable name space
3f. type 0 in the variable value space
3g. click OK and close the windows
4. Very carefully, step by step, follow the F@H installation guide(s) to get your software. I like to be sure to stop all clients before installing a new one. Then when they are all installed, I start them one by one and see how they're doing before starting the next one. (might give clues to what went wrong if anything does)

you might just want to review all this and maybe there is a step that got missed (that's usually what I find when I am doing this).

If you can't find anything seemingly amis, there isn't much you can do but clean up and try again.

As a last resort, you might want to consider using Windows Restore and go to a Restore Point that was created before you began using F@H. (You still need to delete all F@H installation and working folders afterwards)

Good Luck :beer:

-Sid

(OK, this was about a weeks worth of typing for me.... anyone who sees mistakes in this, please don't hesitate to correct me... you won't hurt my feelings.

:cool:

edit: your fahlog looks OK to me
 

MaskedAvenger

Member
Jul 31, 2001
138
11
76
rabrittain, how about if you try using the deino version of the SMP client? That way it would be all new and so you would avoid unerased scraps of your old setup. The deino version has worked fine for me and is easy to clean up when a PC hiccups.
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Originally posted by: rabrittain
Both of my computers that I have dedicated to FAH are pretty much the same - same mobo, same cpu, same type and amount of ram. The other box has completed 15% in the same amount of time. I realize that the wu's are probably different, but 3% in 6 hours is not acceptable.

In Task Manager how many FahCore_a1.exe processes do you see? There should be four.
Their CPU percent should add up to ~100 percent. Is there anything else besides these four processes using considerable CPU%?

What is the Project number of the slow WU?
You can find it in FAHlog.txt or with FAHMon or in the console window. Looks something like:
[14:11:24] Project: 2665 (Run 2, Clone 978, Gen 79)
 

rabrittain

Senior member
Dec 28, 2006
715
0
0
Thanks you guys for responding!

Insidious -- I saved your post and printed it out. I'll be using it as a guide when I uninstall FAH. I don't have any nvidia video drivers. My video card is ATI. Should I uninstall and reinstall the drivers?

Masked Avenger -- That's a good suggestion, and I'm giving it much consideration. The only reason that I'm using mpich is that I'm running XP x64 on the other machine, and I must use mpich for it. I used mpich on the 32 bit machine because I did not want to have to learn about another client. I know - that may not be a good enough reason.

GLeeM -- The project number is 2665 (Run 2, Cone 405, Gen 80). There are 4 instances of FahC0re_a1.exe. Two of them are running pretty good, but the other 2 rarely get aboe 20, and they spend most of their time down around 5-10. So that's definitely a problem. Nothing else in hogging the cpu -- Both cores are up at 98%.


Thanks for you help.
 

Drsignguy

Platinum Member
Mar 24, 2002
2,264
0
76
Originally posted by: rabrittain
Thanks you guys for responding!

Insidious -- I saved your post and printed it out. I'll be using it as a guide when I uninstall FAH. I don't have any nvidia video drivers. My video card is ATI. Should I uninstall and reinstall the drivers?

Masked Avenger -- That's a good suggestion, and I'm giving it much consideration. The only reason that I'm using mpich is that I'm running XP x64 on the other machine, and I must use mpich for it. I used mpich on the 32 bit machine because I did not want to have to learn about another client. I know - that may not be a good enough reason.

GLeeM -- The project number is 2665 (Run 2, Cone 405, Gen 80). There are 4 instances of FahC0re_a1.exe. Two of them are running pretty good, but the other 2 rarely get aboe 20, and they spend most of their time down around 5-10. So that's definitely a problem. Nothing else in hogging the cpu -- Both cores are up at 98%.

Thanks for you help.



Something seems fishy here?

With a Dual core Cpu and running 1 Smp client, you can install Affinity changer. This may help solve some of your problems. I just checked my E7200 rig and Affinity changer works great on it with 1 client running...

 

rabrittain

Senior member
Dec 28, 2006
715
0
0
Thanks Drsignguy!

I'm a little suspicious of Affinity Changer. I installed v1.0.5 for 64bit on my other machine, and after that I had an EUE problem. I uninstalled it, and everything is OK. It is something to consider though.
 

Insidious

Diamond Member
Oct 25, 2001
7,649
0
0
Originally posted by: rabrittain
.
.
Insidious -- I saved your post and printed it out. I'll be using it as a guide when I uninstall FAH. I don't have any nvidia video drivers. My video card is ATI. Should I uninstall and reinstall the drivers?
.
.
.

OOPS :eek:

It hadn't sunk in that you are using an ATI video card. I don't know how much of my advise works for them.

Definitely don't try to use NVidia drivers on an ATI card though!

-Sid