Computer Randomly (Immediately) Shuts Off

horseman18702

Banned
Jan 24, 2009
20
0
66
Hello All,
I have a strange problem that is really annoying and I cant figure out the exact cause. I have a dual cpu PC that I built last year and for some reason, it has started to act up by randomly shutting itself off IMMEDIATELY. By immediately I mean that I can be working and all of a sudden the PC is off. It is hooked up to a UPS and that is fine since all other items hooked to it doesnt shut off as well. I also have the software configured to do a "normal" shutdown in case of power failure after 5 minutes, but this doesnt happen, it just shuts right off without warning.
It isnt a overheating issue since all temps are normal and I have fans galore inside. I have fans on the MB heatsinks, 1 attached to each CPU heatsink, 3 fans in the rear, 1 fan on the top, and 3 fans on the face, and the PSU has its own (of course).
I also know that it isnt a virus related issue since I have run several scans with different software and the system is clean.
It started a couple of weeks ago and is getting more annoying since there is no pattern. When I first had it happen, I ran out of the house and when I returned, I had my motherboards onboard speaker screaming and noticed that the computer was "on" but only in power NOT actually up and running. Then it took me a while to get it to turn back on without the screams from the board. When I got it running, it was on for a couple of days, then it turned itself off. Again, I got it running and within a day it did the same. Now it took 3 days to decide to turn off again. So for a year its been relatively "normal" but now its acting up.
I am stumped since I cant isolate where the issue is. Aida64 shows all temps "normal", all memory is registering fine, all items inside all are registering fine, both CPU's are good, and so on. The only thing I cant figure out (because I dont know enough) is the voltages that are showing from the PSU. The voltages show with the yellow ! and the voltages are:
Battery Volt: 26.7
Input Volt: 123
Power Load: 265w
Max Load: 780w

Here is what my system is:
SuperMicro X8DAH+
Intel Xeon x5680 (x2)
Crucial CT3KIT51272BQ1067Q (x2) - 12GB memory (24GB total)
LSI 9750-4i RAID controller
Silverstone Strider 1500w PSU
4 x Velociraptor 300 GB (for RAID)
1 x WD WD1001FALS
1 x WD WD1500ADFD
1 x WD WD300HLFS
1 x Maxtor 6v300F0
(those last 4 drives are just for storage)
PNY Quadro 5000
And each of the CPU's have a Prolimatech Megahalem on it with a fan attached to each.

Any help is greatly appreciated,
JJ
 

Bubbaleone

Golden Member
Nov 20, 2011
1,803
4
76
When you say "...I had my motherboards onboard speaker screaming..." are you describing one long continuous beep coming from the system speaker? All mobos have beep code patterns that identify specific hardware errors. Exactly what do you hear?
 

horseman18702

Banned
Jan 24, 2009
20
0
66
Bubbaleone,
First, thanks for the response.
I am sorry I didnt clarify that. It was one longggg steady beep. I am familar with the beep codes, when this happens, it just stays "on" (the beep) with no turning off or beeps. Sometimes, if the PC doesnt turn back on when I hit the power button, the "scream" is back the second I hit the power button and it may take several on and offs before the beep goes away and finally starts the PC. I have even pulled the cord from the PSU (because it doesnt have a on/off switch) and waited for the light to go out and even "cool down" before I try again and sometimes it works and sometimes it doesnt.
Can you see why I am stumped :)

JJ
 

Bubbaleone

Golden Member
Nov 20, 2011
1,803
4
76
Yup...I'd be a little aggravated as well. Check PCIe power connections to your Quadro for poor seating.
 

horseman18702

Banned
Jan 24, 2009
20
0
66
I have checked every connection on every piece. I thought I had it figured out because after one of the shutoffs, I was hearing a "click" when it was going to load windows and just freeze on the windows ribbon and not do anything else. I the click to be one of the storage HDD's that had a loosened power connection. BUT the problem happened again, of course without the "click" and what I figured is that I had bumped it when I was plugging and unplugging before. The video card power is seated well, and through each of the startups, it does provide signal (of some sort) to the monitor because when the PC is off, "check PC connection" is on the monitor, but when I hit the power button, "no signal" shows until the MB starts to boot. Thats always been the case and I believe thats normal.
If I had the extra cash right now, I would have already started replacing things and then having a "burn party" in the backyard.
 

Bubbaleone

Golden Member
Nov 20, 2011
1,803
4
76
You already know what I'm gonna propose...Strip her down to the bare neccessities and start the process of elimination...:hmm:
 

Steltek

Diamond Member
Mar 29, 2001
3,309
1,046
136
The monitor signal issue you describe is normal.

According to the manual for that motherboard, a continuous high pitch/low pitch (what it describes as siren-like) beep code is a system overheat warning. This tracks with what you are describing, because the BIOS could be (and I suspect it is) configured to immediately de-power the computer if the temperature of certain monitored components exceed set limits to prevent hardware damage. Such shutdowns could also damage Windows and cause problems booting up as well.

I have a bad southbridge temperature sensor on my ASUS Rampage Extreme motherboard. When it first starting malfunctioning, my computer would do something exactly like what you describe and just turn itself off like the power cord had been pulled. Now, the SB always starts up at 150C even after being shut down and unplugged all night (I had to disable SB temperature monitoring in the BIOS just to be able to use my machine -- it works just fine despite the fact that the BIOS says the SB should be a molten puddle of silicon).

Does it ever do this on a cold start? Also, if you go into the BIOS when you restart after one of these incidents, what system temps are showing in the Hardware Health Monitor section of the BIOS? Does one of the monitored items look totally out of wack there? If so, you may have to disable some hardware monitoring once you verify it is truly a bad sensor and that you do not really have an overheat issue.

That click could be a failing hard drive - I used to run Raptors in a striped RAID0, but I had to RMA so many of them that I just finally gave them up. If you haven't done it, you might want to start up the system using WD's (and Seagate's as well) bootable testing CDs and check all the hard drive hardware for faults.

Bubbaleone is right - you need to pull everything and start troubleshooting with absolute basic/minimal hardware installed. It might even be worth running a Prime95/Orthos session with minimal hardware installed to see if you could have a true overheating issue.
 
Last edited:

horseman18702

Banned
Jan 24, 2009
20
0
66
Guys,
Thanks again for the detailed responses. Just to clarify, I have done the "bare" minimum attempt but it was still sporatic. As BubbaLeaone agreed with, I was ready to put it out of its misery, unitl I realised I cant afford to :)
One thing about these boards is that they run HOTTTT. I went through that last year because it would get extremely hot extremely quick and would shut off because of that. After calling Supermicro and of course support forums, this is not uncommon because they have really poor heatsinks on their boards. You would figure @ $500 for a board, you would get a really good design but it isnt always the case. Thats what lead me to put a lot of fans inside and I actually have a fan dedicated to just the SB.

Steltek - I do have to check the setting for the thermal cut, but at this moment, the MB temp is 126. It actually always runs around this area of temp. And the answer is yes to your question involving a cold boot. The downside is that I dont recall, but I wll check, to see if I can disable the thermal sensor, but I dont believe that I can. I usually dont rely on those things because like you, I have had my share of issues with boards and bad sensors. I will check on those temps on cold boot in the morning just to see. But as far as the click that I had, I did figure that one out and that was an error on my part because I accidentally bumped it when I was plugging and unplugging in the "experiment" phase :)

I will reboot to see if there is a "disable" on the sensors, but I am not that hopeful - LOL

JJ
 
Last edited:

horseman18702

Banned
Jan 24, 2009
20
0
66
Steltek,
I wanted to update my above post but couldnt so I will just add the info here. I rebooted my PC to see the BIOS settings an dI was correct, there is no disabling of the sensors. It will show you the temps and control for fans but thats all. I did however get the temps from the BIOS. AIDA64 showed the 125 degrees, but that is just for the "motherboard" but the BIOS lists that as "system temp". The other temps are:

CPU 1 & 2 = LOW
IOH1 = 150
IOH2 = 141
SYSTEM TEMP = 125

I will recheck the temps for the cold start in the morning since I will be on this for a bit yet :)
 

Steltek

Diamond Member
Mar 29, 2001
3,309
1,046
136
According to the manual, the Super Doctor III software that came with your motherboard can be used to check system health status. When initially installed in Windows, it initially defaults to the monitoring limits in the BIOS; however, once installed, you can change the limits (which overrides the BIOS). If you don't have it installed, it might help you puzzle out what is happening:

ftp://ftp.supermicro.com/utility/SuperDoctor_III/
 

horseman18702

Banned
Jan 24, 2009
20
0
66
Steltek,
Again thanks so much for going through the trouble of looking this stuff up. I already have Supero Doctor III installed and, yes, you can over-ride things, but not the temps. Voltages and those things are able to be over-ridden, but the temps are "locked".
However, immediately after I posted the last post, I went to check Facebook and as soon as the screen loaded, I heard the power go off and I was looking at a black screen. Thats the quickest turn off I had since I did that just this morning. So I took advantage of that and decided to open the case and blow house fans on the inside to cool off everything to see the temps for a cold boot.
Everything appears to be normal and here were the temps:
CPU 1&2 = low
IOH1 = 116
IOH2 = 109
SYSTEM TEMP = 82

This was of course after about 15 minutes of cooler air blowing, but the temps arent crazy.
 

Steltek

Diamond Member
Mar 29, 2001
3,309
1,046
136
Yeah, they don't seem to be out of line. I'm leaning towards bad power supply or failing motherboard.

Is the power supply still under warranty? If so, it might be worth doing a RMA and swapping it out. You can test the various output voltages with a multimeter, but that might not really tell you anything if it is failing under load conditions. As this is a server motherboard, I presume you have the ATX 24v main power connector and two 8 pin power cables supplying power to the motherboard?
 
Last edited:

horseman18702

Banned
Jan 24, 2009
20
0
66
I have been leaning towards the same. I was really hoping to isolate it down so I didn't waste the extra cash on a PSU if it is the MB and vice versa. I was so po'd last night because it happened again about 1 a.m.
The PSU has every needed connector and then some. I chose that one because there were only a handful of PSU's that had the connectors I needed as well as a couple more in case I added another Quadro. Yesterday I did call Silverstone and they will RMA it but the problem I have is that I bought it off eBay last year, and I can't print off the 'invoice' because its a year ago. So I don't know how else to prove it was purchased new. I just wish I could figure out which is dying, the MB or PSU.