Sys admins out there, a questing regarding uptime...

Sunner

Elite Member
Oct 9, 1999
11,641
0
76
Well this question is mostly in regard to uptime of webservers.

What are your thoughts about the acceptable downtime of a webserver?

I've started thinking about this lately since it seems an awful lot of webservers go down quite often, and on top of that stay down for hours, and sometimes even days.
One of the servers in our farm is our webserver, and if it dies for some reason, noone who's involved with it goes home until it's up.
Now, if this takes 10 minutes or 48 hours doesn't really matter, as long as it comes online ASAP.

Now, the question is, is it just me, or is it completely unaceptable to have several hours of downtime every now and then?

im just getting very annoyed at some rather big websites who seem to have serious problems with their webservers...

Oh and in case anyone wants to know what this has to do with OS's, I figure most of the server admins are gonna hang around this forum.
 

SaigonK

Diamond Member
Aug 13, 2001
7,482
3
0
www.robertrivas.com
Good question!
I for one always get the feedback from users that NO downtime is good.
We all know this might not be realistic, but it is what I strive for and I can say that i do a damn good job of it.

Even if we have a hardware failure, our webservers wont be down for more than 1/2 a day at a time since we do constant imaging of each server.
This allows us to go back every 8-10 hours and rebuild from that point where the server was running a-ok.


 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
I too get very annoyed when a web site is down for an extended period of time, especially when it's a vendor supposedly selling secure, reliable products.

Zero downtime is impossible, but when a decent setup (multiple boxes, maybe even multiple lines) and a decent admin you can get pretty damn close.
 

Mucman

Diamond Member
Oct 10, 1999
7,246
1
0
We advertise 99.9% uptime which is probably accurate for actual server uptime... but sometimes services crash (stupid Cold Fusion). You can never please everyone though... we just had a customer who complained about 20s downtime! Such a minute downtime doesn't notify our pagers so we told him that nothing was down. He demanded we checked the log files and it did show Cold Fusion going down for 20s. Our monitors run every minute so it didn't catch it. The guy drives me the up the wall and out of the 1300 customers we have only about 5 or so are like him.

Some problems with uptime aren't server related. We have gotten complaints that their website doesn't work but it was because they never transferred their domain to point to our nameservers (which we have already configured to work for their domain). We somehow get blamed for this *shrugs*.

The two worst downtimes I remember in my lengthy 6 month term so far :) were do to NIMDA (about 12 hours :() and a mail server HD about to crash and it wasn't RAID 1. We had to put in a drive let NT4 create the mirror and then boot off of the new drive. This took it down for about 2 hours. What bugged me was that both of these were preventable downtimes. Hopefully when I do more admin stuff I can turn it around!

We never have anything down for days (thus far). Most of our downtime is caused by : iisresets, re-boots, ColdFusion, improper DNS entries. I am in the web-hosting business as you can probably tell.
 

neuralfx

Golden Member
Feb 19, 2001
1,636
0
0
hm speaking of this can anyone else get to internic.net or whois.net .. i have tried from a few different places, and could not get there, for a couple weeks ..something happen that i dont know about? ..
-neural
 

millsy

Senior member
Jul 26, 2001
495
0
0
I would seriously consider using RAID 1 or 5 and setting up a cluster of servers using 2000 Advanced Server or data Centre.
Atleast if a HD goes there is another that can take over. However, if a SCSI or IDE controller goes then the duplexed HD on other controller can take over.
If you just setup a cluster of 2 or more web servers then the other server takes over while you can fix or even update the other server. once the updated server is back online it can update the original server.
Also using clustering you can also setup both servers to take advantage of Network Load balancing.
 

Woodie

Platinum Member
Mar 27, 2001
2,747
0
0
Production WebServer? 0.

Well, the real truth is that the WebSite accepts 0 downtime. That's why we have at least 2 servers for each site, so we can take one down for maintenance or update, w/o affecting business.

Our server stats are 99.9% uptime as well, usually. (~1200 production servers, vast majority are normal, internal servers, not Internet-facing).

--Woodie
 

jtallon

Golden Member
May 13, 2001
1,166
0
0
Keep in mind that some of the large Unix servers can take nearly a half hour to completely boot. The company I work for runs a custom app on large HP and IBM Unix boxes, and a half-hour reboot time is the norm. We use cluster type software to prevent major outages, but even switching cluster control on an application like 'Service Guard' takes 10 or 15 minutes.

Bugs the heck out me too when sites go down, but I have to be amazed at a company like Ebay - I can just imagine the hardware they have to handle as many visitors as they do, 24x7, with along the lines of 99.99% availability... I'm sure it makes our HP V-class servers seem puny...
 

Agamar

Golden Member
Oct 9, 1999
1,334
0
0
I try to always buy Hardware Raid 5 controllers with Hot Swap...They have never done me wrong. Still, if things need to reboot, then let it be so.. I would rather reboot under a controlled condition than let a server run into the ground.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
I try to always buy Hardware Raid 5 controllers with Hot Swap...They have never done me wrong. Still, if things need to reboot, then let it be so.. I would rather reboot under a controlled condition than let a server run into the ground.

That's all well and good, but you should never have one point of failure. You should have a cluster of atleast 2 boxes so you can reboot one and noone will notice, no matter how long it takes to reboot.
 

Mucman

Diamond Member
Oct 10, 1999
7,246
1
0
Nothinman - Do you think small web-hosting companies do this? I have always brought this up to my boss but he said we would have to charge a lot more and our market focus is on low-mid end hosting. How much would you expect to pay for a webhost that does clustering like that?
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Do you think small web-hosting companies do this?

Why shouldn't they?

but he said we would have to charge a lot more and our market focus is on low-mid end hosting

Why a lot more? What kind of accessibility guarantees do you make to your customers now?

I can see the price going up some because you have to pay for extra boxes, cabling, OS (if you use a commercial OS), etc. But I wouldn't want to run a business with a single point of failure, nor would I want my site hosted on one.
 

igiveup

Golden Member
Feb 17, 2001
1,066
0
0
If you are a web hosting service, small or large, then yes. NO QUESTION. If you are hosting for somebody then uptime and performance are your life. If you go down for too long or too often then the customers look elsewhere.

EDIT: for some customers it seems like 20 seconds is too long......
 

jbod

Senior member
Sep 20, 2001
495
0
0
We have applications running that if the server goes, the money goes. So we bought a SAN. Clustered everything including switches and UPS'. The only down time we experience is when the server itself needs upgrades, SAN service packs, or COM objects reinstalled. There is a slight 40 second interim when the cluster fails over, and data can and will be lost if a user is in the said application filling out the page with credit card info and all. The programmers should account for this, to save state, but they haven't listened to us yet.

To answer your question, I think 99.9% is totally feasible.
 

Mucman

Diamond Member
Oct 10, 1999
7,246
1
0
Interesting info. I have just started with this company and I am learning the ropes. Looks like I got a lot of work ahead of me....