Help troubleshooting Milkyway@home computation errors

faxon

Platinum Member
May 23, 2008
2,109
1
81
The problem: When running Milkyway@home 1.02 (opencl_amd_ati) de_separation units of any type (currently seeing several names with that starting moniker) on my GPUs (an HD5870 and HD6970 with fresh drivers), both cards are getting a computation error after 6-12 seconds and going to the next unit.

Fix attempts: I tried swapping the ram on the computer to make sure it didnt have bad ram. The ram installed is brand new and from my main rig in sig), both sticks used individually and the rig's 1 stick all presented the error and no other memory specific errors are occurring. since it's a fresh driver install I'm reluctant to take it offline since it has a huge cache of N-Body Simulation 1.38 (mt) units to crunch at right now and it's churning through those like they dont exist, but the 6-12s + 2s or so between units that it's costing me each time I fail one of these units

Anyone else having an issue like this? the rig was to my knowledge smashing these units out just fine when it was the single 5870 in it, and i don't wanna go mucking with the drivers till i go pick up my other 5870 since ill probably have to anyway. Any similar issues being reported? I'm about to take a look at the project forums as well but i've never had a computation error like that ever so idk what to start troubleshooting first
 

Rudy Toody

Diamond Member
Sep 30, 2006
4,267
421
126
As I recall reading somewhere, Windows runs BOINC as a service and you have to get into the service entry and set some switch to allow comminication to the stand-alone GPU.
 
Last edited:

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
i don't think Faxon has BOINC installed as a service b/c if he did, then he wouldn't have been able to crunch with GPUs at all. note that he specifically stated that this machine was crunching these WU's just fine when the 5870 was the only GPU installed (before he added the 6970). When installing BOINC on a windows platform, the option to install BOINC as a service is presented during the installation process - it doesn't automatically get installed as a service.

Faxon, i know driver issues don't seem logical considering that the 5870 worked fine before the addition of the other card...but it still may have something to do with the drivers. more specifically, 5xxx series GPUs may have slightly different driver requirements than 6xxx series GPUs when it comes to Milkyway@Home. i'll do a little more research later this evening.
 
Last edited:

faxon

Platinum Member
May 23, 2008
2,109
1
81
yea its possible that the more up to date driver is causing issues. it was on 12.9 before i installed the 6970 now it's on whatever the latest release driver is. my 7950s are running the latest beta drivers without errors, ill probably test these since they had some interesting gaming fixes that i can take advantage of if i crossfire my 2 5870s once i get the last one (still sick fml), and if that doesnt do it then revert to 12.9 and test that. even if the driver requirements for each were slightly different (which they should be and are), the driver installer has the drivers for all the cards packed into it and it should work fine.

this is the part where in the process of writing all of that, i realize that i had a GTX260 in the rig when i installed the AMD drivers, and the cards were installed in a slightly different configuration. That and some other things leads me to think maybe a fresh install is a good course to try. i'm no longer burned out on troubleshooting like i was 3 days ago, just extremely fatigued and annoyed that i don't get to go party tonight on halloween!

ed: so reverting to 12.9 made it so that every gpu unit i tried to run failed instantly. reinstalling current 13.10 drivers fresh didn't help either. I then racked my brain and was about to ask for a good driver scrubber since i figured now was a good time to run one and driversweeper was discontinued, only to discover that it wasn't actually discontinued and idk what im remembering now. still no help, still getting computation errors but at least now it can run the modified fit units again and i ruled out any latent files from the nvidia install and the old drivers, or a bad driver install as possibilities

ed2: after updating BOINC my 5870 is no longer being issued work units. looks like downgrading my version is in order till i figure out why that happened. apparently the card is detecting but is listed as "not used" for some reason and there's no switch anywhere i can flip to tell the damn app to use it. reverting back to the previous version fixed this problem for now, but lack of access to updates isn't ideal either
 
Last edited:

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
i was not able to dig up what i was looking for, but i'm pretty sure it can be found by searching the Einstein@Home forums. i seem to recall that parts of the OpenCL library were left out of a specific Catalyst driver version (or versions), though i cannot recall which version(s) off the top of my head...and i may be remebering things inaccurately, so alternatively it might have been the APP SDK that was left out of a particular Catalyst driver version (or versions). IIRC, this caused problems for AMD 5xxx series GPUs, but not AMD 6xxx series GPUs and up. unfortunately i'll be going out of town for the weekend right after i get out of work this afternoon, so i won't be able to poke around the Einstein@Home forums to look for possible problems and solutions until next week. in the mean time, i suggest you skim over the E@H message boards while i'm away (assuming you haven't already solved the problem via a fresh driver install...after all, it seems like you're onto something having recalled that the GTX 260 used to be installed on that machine...let us know how that goes).
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
ill dig through the forums over there. most of these project's forums are so inactive that what i want will probably be on the front page lol. also the milkyway@home thread i started has so far produced several others who are having issues with the same units but for different reasons so it is possible that maybe it has something to do with the units. either way ill go take a look around E@H

ed: in case it's a BOINC version conflict i dug this up for later since it may be needed. http://boinc.berkeley.edu/dl/
 
Last edited:

faxon

Platinum Member
May 23, 2008
2,109
1
81
This was posted over on the milkyway@home forums by someone helping me there. If the fix can't be found in this post then I'm gonna assume there's either a hardware conflict (doesnt like multiple gens of cards ect...) or a bug in the milkyway code since similar troubles have been reported by others with different configurations. here's the posting, i'm a little lost as to where to start with it (i have an idea but not sure how to progress) so figured i'd post it here.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3386&postid=60281
 
Last edited:

biodoc

Diamond Member
Dec 29, 2005
6,343
2,243
136
Have you tried express uninstall all ATI software? I would then run Glary utilities (free version) to clean up the registry and tmp files and then reinstall ATI driver. Then run the Glary utilities to clean up the registry and tmp files again.

No idea if this would work but I find Glary is useful for cleaning up the registry after uninstalling a software package. There are usually remnants of the software package in the registry which may cause problems.
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
this and another app were recommended which i had not heard of yet for doing things like this. i will test both of them and see if either one fixes it. got a link to a legit source for this utility?
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
This was posted over on the milkyway@home forums by someone helping me there. If the fix can't be found in this post then I'm gonna assume there's either a hardware conflict (doesnt like multiple gens of cards ect...) or a bug in the milkyway code since similar troubles have been reported by others with different configurations. here's the posting, i'm a little lost as to where to start with it (i have an idea but not sure how to progress) so figured i'd post it here.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3386&postid=60281
ahh, now i'm starting to piece this all together! you see, Richard Haslegrove (from the Milkyway@Home forums) mentioned your having stated here on AnandTech forums that BOINC was detecting but not using the 5870, but i didn't notice it...until now, when i scrolled up and saw your edit no2. when Richard says you can override BOINC's default behavior of using only the better of two GPUs of the same family and ignoring the other by setting the <use_all_gpus> option described in client configuration, he is referring to making use of what's called a cc_config.xml file, or a client configuration file. i've experienced this exact problem w/ two different dual GPU machines (a 6950/5870 machine and a 6970/5870 machine) in the past. making use of a cc_config.xml file is what finally allowed my BOINC client to use the lesser of two or more GPUs of the same family. what you'll want to do is the following:

1) open a new .txt file using Microsoft Notepad. do not use more powerful words processors (like MS Word) as they can add and save invisible characters to the code you're trying to create.
2) enter the following lines of text:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

...this will enable the lesser of your GPUs to also run. if you ever want to switch back to allowing only the most powerful of a family of GPUs to run, you can either change the 1 to a 0 in the <use_all_gpus> line of text, or completely delete that line of text from your cc_config.xml file altogether.
3) save and close the file. rename the .txt extension to .xml
4) place the file on your BOINC data directory, NOT your BOINC installation directory. if you're unsure of its location, one of the first few lines in the BOINC event log will show you this location.
5) start BOINC (or restart BOINC if its already open)

if your were successful, you should see a line of text in the start-up portion of the BOINC event log that says "Config: use all coprocessors," as well as a line (or lines) of text showing recognition of your lesser GPU(s).

as far as which ATI driver version will work, i can't recall which version i was using back when i had the 6950/5870 and 6970/5870 configurations (i seem to recall 12.4, but i'm not 100% sure). you might also want to try more current drivers, like 13.4. as far as BOINC version is concerned, i can't recall if the developers have dropped support for AMD 5xxx series GPUs in recent versions or not, but i'm almost positive that you needn't run a version as old as v6.10.58. its possible that 5xxx series GPU support hasn't been dropped from BOINC at all in the most recent versions, in which case the current v7.0.64 release might do you just fine.


Have you tried express uninstall all ATI software? I would then run Glary utilities (free version) to clean up the registry and tmp files and then reinstall ATI driver. Then run the Glary utilities to clean up the registry and tmp files again.

No idea if this would work but I find Glary is useful for cleaning up the registry after uninstalling a software package. There are usually remnants of the software package in the registry which may cause problems.
i also recommend using ATI's express uninstall utility (from the programs list in Start -> Control Panel -> Programs and Features) as biodoc did, but it unfortunately doesn't really remove everything and completely reverse the original driver installation process. i've never used Glary - i use Driver Cleaner Pro myself...but before i would recommend using either of those 3rd party driver cleaners and registry editors, i would recommend using AMD's own Catalyst Uninstall Utility (a standalone program that is different from the Express Uninstall Utility that is part of the catalyst driver software suite). this is the utility that Richard Haslegrove referred to on the Milkyway@Home forums, but could only find a link to a webpage that required a login and a password. i too cannot for the life of me find a link to this utility anymore, as AMD has changed the look and mapping of its website entirely since i was last there. i do have it though, and it is only a 2.1MB file, so i can email it to you if you'd like. i recommend running this utility after having used the AMD Express Uninstall Utility and restarting. or alternatively, you can start by using this standalone utility, which will to the same work that the Express uninstaller does and more. at that point i would install the new driver and see how things work. if things are wacky, then go through the above steps again, and then use one of the 3rd party driver cleaners after having used the standalone AMD driver removal tool.
 
Last edited:

faxon

Platinum Member
May 23, 2008
2,109
1
81
excellent thank you sunny. I will look at these driver versions tomorrow. the config file is in place and working, ill update to the latest BOINC and install one of the drivers you recommended and go from there. It looks like (at least in 13.11 betav1) there is no open CL specific content that i can get at from the installer, is it included as a checkbox as part of the custom/advanced install process? i seem to remember it being but if it's not i'd like to find a way to copy/paste that section of the driver into a newer one as well if possible. the computer is used for gaming on occasion and some of the games i run need more up to date drivers for performance reasons which are quite large in difference with these specific games. it's not my biggest priority but it'd be nice, especially since there's probably some performance improvements in there that will help with compute as well.
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
excellent thank you sunny. I will look at these driver versions tomorrow. the config file is in place and working, ill update to the latest BOINC and install one of the drivers you recommended and go from there. It looks like (at least in 13.11 betav1) there is no open CL specific content that i can get at from the installer, is it included as a checkbox as part of the custom/advanced install process? i seem to remember it being but if it's not i'd like to find a way to copy/paste that section of the driver into a newer one as well if possible. the computer is used for gaming on occasion and some of the games i run need more up to date drivers for performance reasons which are quite large in difference with these specific games. it's not my biggest priority but it'd be nice, especially since there's probably some performance improvements in there that will help with compute as well.
oh man, that much i don't know. i haven't used 13.11 beta v1 or any other beta drivers b/c i use my machines solely for crunching and require stability/reliability over the advanced gaming/graphics features that beta drivers sometimes offer. you can try it, but if there's no checkbox for OpenCL support via the custom/advanced install option, i wouldn't really know how or even if OpenCL support can be added to the package after install. you might have to dig deep to find a solution to that one...that, or sacrifice those cool beta features in the name of crunching and find a driver version that has OpenCL support packaged with it.

if i had to guess (and i don't mean to be a pessimist), i think you might have to sacrifice the cool gaming benefits of the beta driver for a non-beta version that contains OpenCL support...that is unless you find a way to add OpenCL support to the beta driver...because in my experience, when choosing a custom install (as opposed to the express install), i never see a checkbox that specifically refers to OpenCL...rather i usually only see a single checkbox that refers to the driver itself (which is usually checked and grayed out since its required), and the rest of the check boxes are usually just additional non-requisite junk like video encoders/decoders, utilities, etc.
 
Last edited:

faxon

Platinum Member
May 23, 2008
2,109
1
81
im not saying i need the latest beta drivers, just that there's been optimizations within the last year that i could use for my comp when its being used by a friend to play planetside 2 in particular and i'd like to be able to keep up on drivers to some extent because of these optimizations
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
^ not a bad idea at all. drivers conducive to crunching on one OS, and the latest and greatest drivers conducive to gaming on the other OS. then your friend who games on this computer just needs to know how to properly suspend all BOINC activity before restarting and booting into the other OS.
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
That's not a bad idea and i've been considering doing this in WINE since the performance is supposed to be really good as well regardless of what linux platform i use. I'll see what happens with it once i get everything tweaked. If i migrate to linux for crunching it's going to be something i do all at once with my boxes and i'm not ready to do so yet.
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
oh my god, i must be asleep on the job! i have that cheat sheet bookmarked myself and never even noticed the OpenCL or SDK support columns...i had always just used it to reference the Catalyst version against the 2D driver version lol. you might as well try v13.4 then, since its older OS's like WinXP that don't have OpenCL support, not older GPUs.
 

biodoc

Diamond Member
Dec 29, 2005
6,343
2,243
136
Just found this OpenCL benchmark app: Compubench CL

http://compubench.com/downloads.jsp

"In order to run CompuBench CL, you will need the latest OpenCL drivers for your device."

Might be a good test to see if your driver version/installation has OpenCL support for your card.

I just ran it on my GTX66Ti and it completed the test successfully.
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
awesome thanks biodoc. imma bookmark this thread for the future in case i run into issues again. didn't get to fixing it last night, i think im just gonna wait till i have the 3rd 5870 here at my house so i can install it and then do a clean sweep all at once, which will be later today
 

faxon

Platinum Member
May 23, 2008
2,109
1
81
edit: deleted original post. after looking closely i noticed that after the driver update, the units were completing but popping an error at the end instead of at the beginning. quarantining the unit to work settings so it doesnt get those units again until i can work on it further and figure out another issue. over on the m@h forums we're still trying to get my client up to date. i tried updating it, but when i update it, i'll need to run a cc_config.xml file with the client to get it to use my 2 5870 GPUs. I did this and ran the config file but it still wont let me use more than 1 card, and i'm betting the reason the units are failing at the end now instead of the beginning is due to the version of BOINC possibly being out of date? would like to get that aspect working before i go tinkering with drivers more
 
Last edited:

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
yes, the BOINC version may be the culprit here...as i mentioned a few posts back, 6.10.58 is quite old, and you might want to jump all the way to the current release candidate 7.0.64. i would definitely update BOINC and then see where you're at.
 

salvorhardin

Senior member
Jan 30, 2003
390
38
91
edit: deleted original post. after looking closely i noticed that after the driver update, the units were completing but popping an error at the end instead of at the beginning. quarantining the unit to work settings so it doesnt get those units again until i can work on it further and figure out another issue. over on the m@h forums we're still trying to get my client up to date. i tried updating it, but when i update it, i'll need to run a cc_config.xml file with the client to get it to use my 2 5870 GPUs. I did this and ran the config file but it still wont let me use more than 1 card, and i'm betting the reason the units are failing at the end now instead of the beginning is due to the version of BOINC possibly being out of date? would like to get that aspect working before i go tinkering with drivers more

Upgrade boinc to 7.0.64, there was a lot of code changes in v7 for gpus. When you open catalyst control center and go into the information tab and then hardware does your second card show up as disabled adapter? When I ran a 5850 and a 4830 together I had the same problem with being able to run both cards at the same time. The only way I could get it to activate was to hook up the second card to the same monitor (58500-dvi 4830-hdmi). If doing that and it shows both cards active in catalyst but you still get one card under boinc then you'll need the cc_config file so boinc can see both cards.