LOL - "just" a server reset. What could possibly go wrong?
:whiste:
If it's already down, you can't really make it worse. That's the part that's key here.
If it's not down but has some kind of issue, then you don't want to jump right to a reboot, but if the whole thing is down, because it's frozen or w/e, then you can't really make it worse by forcing a reset on the VM/server.
I remember one time I get a call from the NOC that every single server was showing as down.. ironicly, at a hospital. This is normally something I'd leave to the senior tech because he knew the environment better and was less likely to be in trouble if something goes wrong, but they could not reach him so I decided to go. I kind of figured it would be something simple like a management switch that's down but turned out to be storage related with the SAN. So I'm there at 4am debating on what to do, as I did not want to reset anything if I can avoid it as I was not 100% sure at this point WHAT even froze. Spent an hour or so investigating and realize that the issue was most likely a iSCSI switch that had randomly rebooted. It was SUPPOSE to be redundant, but they never listened to me when I suggested we do redundancy tests before we make it live. I hate when people tell me "I told you so" but I LOVE it when I'm the one that can say that to someone. Long story short turns out all I had to do is reinitialize the iSCSI paths in ESXi and the VMs all resumed normally, with minor errors in the OS, if that.
Basically by before 8am when all the admin staff comes in, I had the entire environment back up. I got lucky though, it could have been much worse, there could have been corruption and so on.
Still got in trouble for it... should not have touched it because it's production. Sure it's production, but it was DOWN! But they did not care. They still want us to fix it, but without touching it. Got to a point where I did not go on callouts anymore, if something was down, good for them, it stayed down. Oddly I learned that the less work I do at that job, the less trouble I get into. I can confirm that it's possible to get to the end of Reddit.
I was so glad when I moved to NOC. Now I can reset a whole freaking cell site and not get in trouble for it. It's just much more easy going and we can actually do our job without being in trouble and walking on egg shells all the time.