• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Finding out why a server crashed ?

Elixer

Lifer
Is anyone aware of anything that can help troubleshoot why a linux server crashed ?
It is a hard lockup, it must manually be power cycled to get it back up.

There is nothing at all in any of the logs that even hint at a issue.
There is a monitoring service running, but, that can only tell us if server is up or not.

It is running debian stable.
It does the hard lockup about every 3-4 months or so.
 
These are the worst issues. I've been dealing with stuff like that with my new PC for ever a year now (not new anymore I guess). It gets expensive. Maybe try new ram. Since it's a server I imagine it has no add on video card? If yes, maybe try another. I had a whole whack of issues and replacing the video card actually fixed a lot of them.

If you have not, do memtest and other various diagnostics. Try to push it hard to see if it only fails then. If yes, it may be a tad easier to troubleshoot. Most of the stuff I find on google about lockups is while it's being worked hard.
 
Server isn't local, it is in another country, and they were saying to expect a downtime of 1-2 days if we want them to run some diagnostic tests (memory & disk check).

I was just hoping that there might be another way to find out what the problem could be without all the downtime.
 
Oh that's even more frustrating, so this is a lease or a colo? If it's a lease, try to just get another server, if it's a colo, I feel your pain... I've been debating on colocating instead of leasing, but this is the type of stuff that I fear could happen.
 
Oh that's even more frustrating, so this is a lease or a colo? If it's a lease, try to just get another server, if it's a colo, I feel your pain... I've been debating on colocating instead of leasing, but this is the type of stuff that I fear could happen.

Yeah, it is colocation. 🙁
 
Yeah, it is colocation. 🙁
Do you know anything about the hardware? I've run a few servers that hard locked with nothing in the logs...it turned out the systems needed BIOS patches to fix the stability issues.

If you the make/model of the system, you may be able to install some middleware to read firmware versions of BIOS/RAID controllers/SAS drivers/NIC Drivers, etc...

Dell uses OpenManage...I'm not sure what HP uses, but they have ISO utilities as well. If you have lights out controllers or DRACs (Dell Remote Access Controllers), you may be able to login remotely and take care of business.
 
Back
Top