- Apr 24, 2013
- 1
- 0
- 0
Hello,
I've built a server at home for testing. The specs are as follows:
- case: Supermicro 4U tower case (convertable to rackmount)
- motherboard: KGPE-D16
- RAM: 32GB DDR3 ECC Kingston memory (9-9-9-24 I believe)
- CPU: 2x Opteron (8 cores @ 2GHz each)
- RAID: Perc6 (eBay special)
- HDD's: mix of 750's and 1TB's split into two small RAID arrays
The server is running ESXi version 5.1. The motherboard has almost the latest firmware (found a slightly newer version that dropped about 2 weeks ago).
The problems with the system are that it will spontaneously reboot without any warning and any logging to vmware. The only indication I have is that I lose connectivity to everything and when vmware comes back up all I see in the logs are "system was booted".
It appears to be heat related because if I leave the case on, I can't get more than 1-3 days of uptime out of it before it reboots. If I have the case off and a fan blowing over the motherboard, it's able to stay up for about 5 days but never any longer.
The temperature sensors for CPU's show that the CPU's never get above 45C. I can't see the temperatures of the RAM modules while in the OS, but if I ever reboot the system after being on for hours and check the BIOS, no RAM module has been above 110F.
I've run memtest on the system for a little over 9 hours and no errors were reported.
Everything has the latest firmware/update with two exceptions:
- the motherboard (I found a minor revision to the BIOS that came out a short while ago and haven't applied it)
- the RAID card: I don't recall the firmware revision off hand but it's not the latest.
The RAID card I know works because it worked for about two months straight w/o any issues in another box before I moved over to this one.
I've tested the power supply and it's good... At first I thought it was a PSU issue because the PSU that shipped with the case was 550 Watts; this current one is an 860W.
I've tried booting the system w/vmware esxi running as the only drive plugged in (direct to motherboard) while having NO cards plugged in and NO hard drives plugged in (aside from boot)...no change in stability.
The PCI cards I have plugged in are:
- Dell Perc RAID card
- 6 port gigabit Ethernet card.
I'm confused as to where to look next... BIOS settings recommendations? Could it be a bad southern bridge chipset?
Everything is still under warranty so I can RMA something if I need to.
Any ideas/thoughts/tips would be greatly appreciated!
Thanks!
I've built a server at home for testing. The specs are as follows:
- case: Supermicro 4U tower case (convertable to rackmount)
- motherboard: KGPE-D16
- RAM: 32GB DDR3 ECC Kingston memory (9-9-9-24 I believe)
- CPU: 2x Opteron (8 cores @ 2GHz each)
- RAID: Perc6 (eBay special)
- HDD's: mix of 750's and 1TB's split into two small RAID arrays
The server is running ESXi version 5.1. The motherboard has almost the latest firmware (found a slightly newer version that dropped about 2 weeks ago).
The problems with the system are that it will spontaneously reboot without any warning and any logging to vmware. The only indication I have is that I lose connectivity to everything and when vmware comes back up all I see in the logs are "system was booted".
It appears to be heat related because if I leave the case on, I can't get more than 1-3 days of uptime out of it before it reboots. If I have the case off and a fan blowing over the motherboard, it's able to stay up for about 5 days but never any longer.
The temperature sensors for CPU's show that the CPU's never get above 45C. I can't see the temperatures of the RAM modules while in the OS, but if I ever reboot the system after being on for hours and check the BIOS, no RAM module has been above 110F.
I've run memtest on the system for a little over 9 hours and no errors were reported.
Everything has the latest firmware/update with two exceptions:
- the motherboard (I found a minor revision to the BIOS that came out a short while ago and haven't applied it)
- the RAID card: I don't recall the firmware revision off hand but it's not the latest.
The RAID card I know works because it worked for about two months straight w/o any issues in another box before I moved over to this one.
I've tested the power supply and it's good... At first I thought it was a PSU issue because the PSU that shipped with the case was 550 Watts; this current one is an 860W.
I've tried booting the system w/vmware esxi running as the only drive plugged in (direct to motherboard) while having NO cards plugged in and NO hard drives plugged in (aside from boot)...no change in stability.
The PCI cards I have plugged in are:
- Dell Perc RAID card
- 6 port gigabit Ethernet card.
I'm confused as to where to look next... BIOS settings recommendations? Could it be a bad southern bridge chipset?
Everything is still under warranty so I can RMA something if I need to.
Any ideas/thoughts/tips would be greatly appreciated!
Thanks!