Question Technical question about GPUs and PCI-e host systems, powering-down in emergency situations. (Technical)

VirtualLarry

No Lifer
Aug 25, 2001
56,229
9,990
126
I've got a pair of MSI 8GB RX 570 GPUs, that I have been using for mining.

I had both of them in a Ryzen R5 1600 rig, with an EVGA 650W 80Plus Gold G1+ (10 year warranty, Jpn caps), and it started shutting the PC down.

I moved both of those cards to my backup mining shell rig, with a G4560 and a Z170 board, and it has an EVGA 600W Bronze PSU, with a single cable with dual 6+2 connectors. (I know, not ideal.)

Well, I was mining on it, the last 2 days or so,. and then I switched it over to BOINC / PrimeGrid, and it's shutting down in like 15-30 minutes of running. No beeping, just when I look at it, it's powered-down, keyboard / mouse lights mostly off.

So, what's going on here? Either I've got two EVGA PSUs that have gone bad, including one high-end one, OR... there's something about the GPUs shutting down the system they're in.

Can a GPU signal, via the PCI-E bus, some sort of NMI/SMI interrupt message, to "emergency shutdown" the system? Maybe due to excessive temps?

One of the GPUs gets to 90C and stays there, which is I think the limit for these cards. If it get above that, or much higher, could it be shutting down the rig that the card is in?

Edit: It should be noted that the PC with the 650W G1+ PSU, I then put an RX 580 (Newegg refurb) Sapphire Nitro+ edition card in, which has both a 6-pin and an 8-pin power connector, and I was mining on that for a few days without shutdown, so I'm not convinced that the 650W G1+ is bad altogether.

I am currently running that rig with a GTX 1660 ti (120W, according to the mining software), mining on both CPU + GPU. No shutdowns.

So, thus far, it seems the shutdowns follow the MSI 8GB RX 570 card that run at 90C under load. Maybe it severely needs a re-pasting, or something. I should check the warranty, but it's probably void for "blockchain usage".

It should be noted, too, that in the G4560 rig, the CPU itself was only getting to 74-75C under load, and shouldn't have been hot enough to initiate a CPU thermal shutdown, nor the Ryzen rig that it was in, with a Gammax 400 cooler.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Hmm.

I've had Win10 lock up and go into a reboot sequence from AMD graphics driver crashes before. I've also had it hard lock and refuse to POST until after a cold boot. That's on my two Ryzen systems (1800x, 3900x). It is theoretically possible that a driver failure could lead to the system powering itself down.

Is your mining software keeping event logs?