Explain to me the reasoning behind not letting low level I.T. guys reset a server

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 25, 2011
16,994
1,622
126
Because when you reset/reboot a server, sometimes things don't come back up right, and/or you're worse off than before. It's that senior person who's going to be on the hook to fix it.
 

BlitzPuppet

Platinum Member
Feb 4, 2012
2,460
7
81
We had a server at our old company that hadn't been rebooted in 4-600 days or so.

Needless to say we were afraid to reboot it, but eventually we had to and it booted back up just fine.
 

Red Squirrel

No Lifer
May 24, 2003
70,662
13,834
126
www.anyf.ca
LOL - "just" a server reset. What could possibly go wrong?

:whiste:

If it's already down, you can't really make it worse. That's the part that's key here.

If it's not down but has some kind of issue, then you don't want to jump right to a reboot, but if the whole thing is down, because it's frozen or w/e, then you can't really make it worse by forcing a reset on the VM/server.

I remember one time I get a call from the NOC that every single server was showing as down.. ironicly, at a hospital. This is normally something I'd leave to the senior tech because he knew the environment better and was less likely to be in trouble if something goes wrong, but they could not reach him so I decided to go. I kind of figured it would be something simple like a management switch that's down but turned out to be storage related with the SAN. So I'm there at 4am debating on what to do, as I did not want to reset anything if I can avoid it as I was not 100% sure at this point WHAT even froze. Spent an hour or so investigating and realize that the issue was most likely a iSCSI switch that had randomly rebooted. It was SUPPOSE to be redundant, but they never listened to me when I suggested we do redundancy tests before we make it live. I hate when people tell me "I told you so" but I LOVE it when I'm the one that can say that to someone. Long story short turns out all I had to do is reinitialize the iSCSI paths in ESXi and the VMs all resumed normally, with minor errors in the OS, if that.

Basically by before 8am when all the admin staff comes in, I had the entire environment back up. I got lucky though, it could have been much worse, there could have been corruption and so on.

Still got in trouble for it... should not have touched it because it's production. Sure it's production, but it was DOWN! But they did not care. They still want us to fix it, but without touching it. Got to a point where I did not go on callouts anymore, if something was down, good for them, it stayed down. Oddly I learned that the less work I do at that job, the less trouble I get into. I can confirm that it's possible to get to the end of Reddit.

I was so glad when I moved to NOC. Now I can reset a whole freaking cell site and not get in trouble for it. It's just much more easy going and we can actually do our job without being in trouble and walking on egg shells all the time.
 

Red Squirrel

No Lifer
May 24, 2003
70,662
13,834
126
www.anyf.ca
We had a server at our old company that hadn't been rebooted in 4-600 days or so.

Needless to say we were afraid to reboot it, but eventually we had to and it booted back up just fine.

My web server hit something like 1300 days a few years back. I was soooo scared to have to reboot that, I just never did.

Data centre got a power failure, it got rebooted. :( That was sad, it was the longest uptime I ever had on a server. Though I was happy to see that everything had come back up properly.

My home file server is kinda in the same situation, I screwed up once and did a mv command with / space and the rest of the command... needless to say it screwed up the whole file system. I managed to move everything back, and make permissions the same thing as they are on another server, so it *should* boot up ok, but I'm too scared to even try it. Actually in a few days from now it will make a year. :p

If I know the power is going to be out for more than 4 hours (my typical UPS run time) I rather spend money to buy more batteries than to reboot it.