Bizarre Problem/Possible epic fail with SLI GTX 580

Masahiro

Member
Oct 25, 2011
87
0
66
Alright so here's the deal:

I was toying around earlier with the OC on my GTX 580s (WC) using MSI Afterburner. I decided to try ramping up from 1000mhz core clock (1.15v) to 1050mhz and increased the memory clock from 2004mhz to 2200 mhz (2004mhz was stock). The temperatures went up from about 37C load (GPUGrid) to 57C and then the screen blanked out and came back but was artifacting so I immediately removed the OC and restarted the PC. Everything booted up fine with no issues.

But then I noticed when I started GPUgrid again I was only getting one task (SLI disabled when I'm on BOINC). So I looked at Afterburner and one of the cards was at 0% consistently. I got worried and ran Kombustor to see if the same thing would happen when I put it into SLI and ran a test. Sure enough it did stay at 0% (would downclock from 398mhz to 0) but it would always start off working for a second. I'm fearing the worse that I may have ruined one of my cards. To find out, I tried turning off SLI again and switching my second display into the second card to see if it would send a signal and sure enough it did. Any other ideas on how to figure out? I'm gonna be super pissed if it turns out I messed up a card, feel like an utter moron.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Alright so here's the deal:

I was toying around earlier with the OC on my GTX 580s (WC) using MSI Afterburner. I decided to try ramping up from 1000mhz core clock (1.15v) to 1050mhz and increased the memory clock from 2004mhz to 2200 mhz (2004mhz was stock). The temperatures went up from about 37C load (GPUGrid) to 57C and then the screen blanked out and came back but was artifacting so I immediately removed the OC and restarted the PC. Everything booted up fine with no issues.

But then I noticed when I started GPUgrid again I was only getting one task (SLI disabled when I'm on BOINC). So I looked at Afterburner and one of the cards was at 0% consistently. I got worried and ran Kombustor to see if the same thing would happen when I put it into SLI and ran a test. Sure enough it did stay at 0% (would downclock from 398mhz to 0) but it would always start off working for a second. I'm fearing the worse that I may have ruined one of my cards. To find out, I tried turning off SLI again and switching my second display into the second card to see if it would send a signal and sure enough it did. Any other ideas on how to figure out? I'm gonna be super pissed if it turns out I messed up a card, feel like an utter moron.

Well, since the card works fine by itself (I believe that's what you are saying) the first thing I would try is reinstalling the drivers. I'm assuming you've already tried a simple reboot?
 

Masahiro

Member
Oct 25, 2011
87
0
66
Well I just finished reinstalling the nvidia driver and here's what's happened so far:
After installing, I wasn't able to enable SLI at first for some odd reason, but after a restart sli has been enabled. Unfortunately, when I run MSI Kombustor I still have the same thing happen with only one card working and the second doing nothing. I'm going to try removing the power from one card and see if the other card will start working just to know that it is working. Any other ideas?

EDIT:

Well I'm happy to report that both cards work fine individually, they both display out fine when the other is not plugged in and run MSI Kombustor with no issues. But I'm also perplexed by why it is that when they're in SLI one card isn't working???
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Try reversing the slots the cards are in, disconnecting and reconnecting the power to them. You can try removing Kombustor and/or any other programs that monitor or react with the card.

Because you noticed a temp spike it could actually be a problem with the card. If it works not in SLI though. I would try other basic trouble shooting techniques before resigning myself to the fact that the card is screwed.
 

Masahiro

Member
Oct 25, 2011
87
0
66
Reversing the cards will be...problematic. Because they're liquid cooled it's going to need draining the loop, removing the bridge, switching the cards, and filling the loop again. I want to try to troubleshoot any other issues before resorting to that if possible. I'm still kind of at a loss though because they both work fine individually. I'm going to try running BOINC again and then instead try running BF3 and install some GPU monitoring overlay to see what's happening
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Yeah, the WC'ing would make it difficult. If you can try reseating them in the same slots and disconnect and reconnect the psu wires.
 

Masahiro

Member
Oct 25, 2011
87
0
66
I'll try reseating them and see if that fixes anything. But technically speaking, would it make sense that the second card failed if it works fine alone?? Maybe the SLI bridge is the problem?

EDIT: Just for clarity, when I said earlier I tried running each card individually what I did was removed the PSU wires for one card but left it in the slot, booted, tested the card quickly and shut down again and repeated the procedure with the second card.
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Depends on what might have broken with the card. If it's the bridge, replacing it should fix it. If it's something on the PCB that has to do with SLI (traces to the SLI tabs, for example) the card could be borked for SLI. It's likely not the card though, since it works alone. Could be the mobo? When you upped the voltage it might have shorted something in the board. It could be the PSU? It's likely something software related though.
 

Masahiro

Member
Oct 25, 2011
87
0
66
Well I'm gonna try reseating in the morning along with some other tests but just as a logical thought experiment:

If it WAS the bridge, wouldn't SLI not work at all? Then again the fact that after reinstalling drivers but before restarting the SLI wouldn't enable might be an indicator that it is the bridge?

If it was something on the board is there a way to tell? I'd imagine that the only part of the board that could have an impact would be on the PCI-E slots? Then again, I'm by no means an expert.

As for the PSU, not quite sure how that would have caused the issue.

Finally software, I'm gonna go ahead and try reinstalling BOINC tomorrow. Any reccomendations on what monitoring program I can use while gaming to keep track of GPU loads?

EDIT: I tried running BF3 and sure enough it also only has one GPU on load and the other does nothing...getting worried all over again. Gonna check and see if i have a second sli bridge
 
Last edited:

Masahiro

Member
Oct 25, 2011
87
0
66
Sorry for the double post, but I may have solved my problem.

So what I ended up doing was flipping the SLI cable around and putting it on the other set of SLI connectors. Additionally, I also (this time at least) turned off the power and unplugged psu power plug (instead of just flipping the switch) and (with much difficulty) reseated the GPUs. After that I'm running BF3 now and both GPUs are back to working like before (hooray!). Thanks a lot for your help and ideas 3DVagabond, I thought I was seriously screwed there.

EDIT: Well maybe not quite fixed...but I'm certain it's software related now. For some reason it still only uses one GPU in MSI Kombustor and GPUGrid...I'll try reinstalling them later and see if that does anything.