MSI 7970 Lightnings Crossfire - Second Card Won't Post and Powers Off

Malachor · May 30, 2013

Hi All,

I'm posting this on various forums because I've been unable to get a solution to my problem. Please may I ask some assistance from you wisened gurus? Any advice or at least some knowledge would be invaluable to me. First, my system specs and some background:

System Specs:

HAF X Case - All fans replaced with 200m Megaflows
ASUS Rampage IV Extreme BIOS 4004/3404 (chip1/chip2)
Intel 3930k C2 Stepping @ stock
32GB GSKILL RipjawZ 2133Mhz @ 1600Mhz (will push top max speed once everything works)
2 x MSI 7970 Lightning’s 3GB in Crossfire @ stock (GPU 1 in slot 1 [red x16] GPU 2 in slot 3 [black x8])
Titanium HD Sound Card (slot 5 black PCI-E x1)
Samsung 840 Pro SSD (Boot) - latest firmware
OCZ Vertex 3 MaxIops (Games) - latest firmware
2 x Seagate Barracuda 1TB in Raid-0 (VM's/Projects/Storage)
1 x WD 500GB (Extra Storage)
Corsair Ax1200w PSU
Sony Blu-Ray Burner/Player
2 x Iiyama Prolite 1920 x 1080 Monitors
Lamptron FC-9 Fan Controller
Koolance Dual Pump Relay (overvolt pumps 12V-->24V)
Custom Water Loop - 1 x Black Ice SR-1 480 Rad externally mounted, 1 x GTX 360 Rad Internally mounted, EK 250ml X2 Advanced Res, EKFC7970 Water Blocks for both GPU's, XSPC Raystorm Waterblock for CPU, Prolimatech LRT 1/2" ID 3/4" OD tubing, VL4N QDC's, Bitspower fittings/rotaries, 2 x Koolance-PMP 450s pumps with EK Dual Top, Prolimtech PK-1/Phobya HeGrease TIM, Apollish Vegas 2000rpm fans for GTX Rad/Shark Aerocool 1500rpm fans for SR-1 Rad.

Background:

System was running stably for 4+ months however GPU2 was running too hot (GPU1 42 degrees under load, GPU2 75-85 degrees under half load). Both cards are cooled by the same block and both have a thin layer of Prolimatech PK-1 applied to the GPU core (using X method). The cards are linked by a triple slot EK Plexi bridge.

I decided to swap the cards around as a test. The temperature variations followed the same card so I took it apart, noticed the TIM had gone hard in the middle and there was way too much. Re-applied TIM using credit card method and put back into original (slot 3) position. Since that day, the card powers down after 20-30s of being on. When I turn the PC on, the blue lights on the rear PCB come on (all 8) and stay on for about 20-30s before going out. The reactor core stays blue at all times. If I'm lucky enough to get into the BIOS while the blue lights are on, the card shows in the BIOS, if the lights go out, then the card won't show in the BIOS. This leads me to believe there is some kind of power saving feature going on, DMI/IRQ conflict or the card was simply bust. I should also point out that the "Boot Device LED" light is solid red from power on until the Windows Logo comes up, then it disappears and the status readout shows "AA" (everything running nominal). Before applying for an RMA, I performed vast amounts of troubleshooting/testing, including but not limited to:

- Using the PCI-E dip switches on the motherboard to disable lanes and test each card individually (issue follows the same card regardless of position i.e. there is no display from the second GPU because the card powers off 20-30s after POST)
- Connecting 1 and 2 monitors to the card to see if it 'kick starts' it into staying powered on
- Disconnecting everything apart from primary boot drive, 1 stick of RAM, GPU's, keyboard/mouse
- Uninstalling drivers using AMD Driver Removal tool followed by Driver Fusion in safe mode, then re-installing latest drivers (including 13.5 Beta 2 drivers and latest CAP)
- Using MSI Afterburner latest version to disable ULPS and also registry tweaks to set "enableULPS" keys from "1" to "0" (there's about 10 of them on my machine)
- Running with one card (required re-tubing of system because of plexi bridge and position of other components in the case)
- Doing a complete re-install of Windows with everything at 'optimised defaults' in the BIOS (note this is possible because GPU1 works fine, so I connect the monitors to that whilst testing)
- Swapping power cables from GPU1 to GPU2
- Reseating all power cables and checking they're all positioned correctly
- Ensuring there is power to the EZ-Plug next to the GPU's
- Changing crossfire cables (3) in positions 1 and 2 on the cards

Realising that all of the above had been to no avail, I applied for an RMA with Overclockers.co.uk. They wanted photos of the card and for me to reassemble to stock cooler, which I did. I sent the card off and 3 days later had a response that they were sending it back to me because there were no problems found and the card passes all benchmarks. So, the card MUST be working?? I tested the card with stock cooler in my old PC and it did indeed turn on, stay on and give me a display. Cracking I thought, so everything is hunky dory again. Wrong. I took the card apart again, checked the PCB with a magnifying glass and gave the waterblock a thorough cleaning and ensured all thermal pads were in optimum positions. The waterblock was put back on, thermal paste applied thinly and evenly, mounting pressure was uniform and cross-sectioned in terms of screws and everything looked perfect. The pads were flush, the core was touching perfectly and the rear mounting plate with the core reactor was mounted correctly. Checked in between layers for potential shorting issues - nothing. Put the cards back into the system as per the original spec (water flowing great, both cards freezing to the touch, no leaks, 8-pin power cables secure, seated firmly in PCI-E slots etc etc), powered on and SAME thing. BOTH GPU's turn on at initial power on, then, after 20-30 seconds, the blue lights on GPU2 turn off completely as if the card is powering down because it's not in use. GPU1 at this point is on and there's 2 solid blue lights on the back.

I've been trying to rectify this issue for about 3 weeks now in the very sparse time I have to do so (I get maybe 5 hours a week because of work/family/gym obligations). I have tried everything I can think of and just don't know what to do next. I just don't understand how both cards were working fine (albeit one very hot) then all of a sudden, after changing them around and re-applying thermal paste, everything goes to pot. What could it be? Is it indeed a power saving issue? Is slot 3 (black PCI-E x8) now dead? Is the motherboard now dead?? Why would it show "Boot Device LED" at power on then status "AA" once in windows?? No matter what though, the second card just won't stay on. It's not even recognised in Windows (old or new build). It did show up in safe mode one time, but as soon as I booted normally, it disappeared from Device Manager. I am truly at my wits end and am turning on someone, anyone with the knowledge/skill-set to help me. If anyone knows how I might proceed or at least knows someone with these cards, I'd be eternally grateful for any information you could provide. I apologise for the wall of text but wanted to make sure I provided every last detail 🙂

Thanks very much indeed everyone and I look forward to your feedback.

Kind Regards

Malachor.

Black Octagon · May 30, 2013

To be honest mate I suspect the problem lies with your motherboard...possibly the PCIe slot or potentially something else altogether. Although this would be a major PITA with a waterloop, I don't suppose you can try a different motherboard?

Malachor · May 30, 2013

I did try it in my old PC with the stock cooler on and it worked absolutely fine. As soon as I put the waterblock on it and put it back in my new PC, it didn't. I agree with you that the PCI-E slot may be dead, or, the PSU may have a faulty rail perhaps? Was going to try another PSU as the next port of call. If the same thing happens with a different PSU then it's looking likely that you are correct. Thanks for the comment buddy.

philipma1957 · May 30, 2013

did you run the "bad" card all alone with the good card sitting on the bench out of the mobo completely. If not run the bad one alone use it in the first slot then use it in the second slot. If it has heat issues. go to catalyst 13.4 remove all the catalyst drivers then reinstall them. remove msi afterburner then re install it

Once you have all new driver then try to run the bad card all alone in all of your pci-e slots. also try all your pci-e power cords from you psu.

If the card has issues take screen snips of all the issues put it aside run the one good card if the card works take snips of the screens. at this point you have conclusive proof the card is bad.

contact the rma people and send them all the screen snips.

Malachor · May 30, 2013

Thanks for the response Philip.

I haven't run the bad card by itself given that the loop is currently designed (and the tubes lengthed) such that the two GPU's must be connected together via the plexi bridge (I'd be able to do it if I had blanking plates to block unused ports on the bridge, but I don't have any, so need both cards in at the same time). I did however swap both cards around and test using the PCI-E dip switches on the mobo, which is effectively the same thing as it completely disables the lane(s) you choose. I can say that the issue followed the bad card whether it was in slot 1 or slot 3. If I want to test in the other slots, I'm going to need to purchase a 4-slot plexi bridge and some blanking plates as I really don't want to re-tube the whole system just so that I can test one card at a time. I know that sounds like I'm cutting corners, but I've honestly spent 30+ hours trying to sort this out and don't have much time on my hands to keep up the troubleshooting.

The problem is that I've already sent it back under RMA and they returned it saying it works fine. If I send it off again, what's to stop them just sending it back again? I'm going to try using another PSU to power the bad card to see if it stays on. That way I can rule out one more possibility. Thanks for your feedback it's appreciated.

philipma1957 · May 30, 2013

Malachor said:
Thanks for the response Philip.

I haven't run the bad card by itself given that the loop is currently designed (and the tubes lengthed) such that the two GPU's must be connected together via the plexi bridge (I'd be able to do it if I had blanking plates to block unused ports on the bridge, but I don't have any, so need both cards in at the same time). I did however swap both cards around and test using the PCI-E dip switches on the mobo, which is effectively the same thing as it completely disables the lane(s) you choose. I can say that the issue followed the bad card whether it was in slot 1 or slot 3. If I want to test in the other slots, I'm going to need to purchase a 4-slot plexi bridge and some blanking plates as I really don't want to re-tube the whole system just so that I can test one card at a time. I know that sounds like I'm cutting corners, but I've honestly spent 30+ hours trying to sort this out and don't have much time on my hands to keep up the troubleshooting.

The problem is that I've already sent it back under RMA and they returned it saying it works fine. If I send it off again, what's to stop them just sending it back again? I'm going to try using another PSU to power the bad card to see if it stays on. That way I can rule out one more possibility. Thanks for your feedback it's appreciated.

here is the problem if you mobo no longer works correctly with 2 cards your methodology of testing does not pick up that the board is the issue.

you will need to test with my method to fully determine if the board has an issue...

Also your 2 card cooling loop may have a weird issue.

If you attach the oem cooler to the card put it on the board and it works first it will show the seller was honest . second it will show you have a board issue or possible cooling problem.

If you don't test the card alone you just can't tell if it is the mobo the cooling or the card.

MSI 7970 Lightnings Crossfire - Second Card Won't Post and Powers Off

Malachor

Junior Member

Black Octagon

Golden Member

Malachor

Junior Member

philipma1957

Golden Member

Malachor

Junior Member

philipma1957

Golden Member

TRENDING THREADS