Setting Up My First HA Network

msound

Junior Member
Nov 20, 2006
16
0
0
Hey Hey!

So I'm super excited because I've been asked to fully design an implement a HA network environment for a new .com start up.

The web site is going to store and stream a lot of media content (common with a lot of web start ups these days) and unfortunately that's where my experience runs a little short.

Configuring load balanced apache web servers and mysql database servers is no big deal, but I'm a little unsure about how to deal with the all of the uploaded media files.

I'm obviously looking to a SAN solution, but I'm a little unsure of how to configure the network. Here's what I have so far (10,000 ft overview):

Firewall -> Web Load Balancers -> Web Node -> Datatbase Load Balancer -> MySQL Database Node

The DB Nodes' data will probably be stored on the SAN.

Now for the media side of it I was thinking about using NFS to mount the centralized media store to all of the web nodes. That way the content urls will work across all of the web nodes, because each node will have the same directory tree /var/www/media which is just an NFS mount from the SAN.

So my real question is - do I really need NFS to achieve this? Or is there another way of sharing partition hard drive space on the SAN with each web node? If I do need to use NFS, then that would mean I'd need an NFS server to make the SAN space accessible by the web nodes (ie the NFS clients).

What is the industry standard approach to doing what I need to do. Obviously in a clustered environment with multiple web nodes where users can upload/stream media - there needs to be a centralized way of making that data accessible - and I'm just a little confused of how it's done.

Any/all help would be greatly appreciated. And wish me/us luck!

Cheers!
 

Netopia

Diamond Member
Oct 9, 1999
4,793
4
81
You are so beyond me that I'm humbled.

What's your background? Schooled? Self-taught? If schooled, what exact major?

Joe
 

msound

Junior Member
Nov 20, 2006
16
0
0
heyyy a fellow Marylander. I actually just moved to California from Baltimore back in Sept. but I'll be home in two weeks to visit - I've heard it's been unseasonably warm out there.

I appreciate the compliment but I'm really not all that great with linux - I only consider myself an intermediate user.

I'm self taught but I've held job positions either using/administering linux for about 5 years (staring with redhat 7.3).

For the web cluster I'll be following this guide:
http://www.howtoforge.com/high...alanced_apache_cluster

and for the mysql database cluster I'll be following this:
http://www.howtoforge.com/load...d_mysql_cluster_debian

I've already read through them and they sound pretty straight forward.

Obviously the firewall/gateway will have to push web traffic 80/443 to the Apache load balancers, which will then direct the traffic to a free node. The web nodes will make DB requests to the MySQL LB's, which will then obviously just direct the request to a free DB node.

My issue is that all of the content needs to be stored on a filesystem instead of a sql blob - this was the way the developers programmed the file handling.

In the website's main config file I can specify the path of the media folder. I obviously can't store any content on the web nodes - because then the content will be randomly distributed across the nodes depending on which node a user gets directed to when they come to the web site.

Sooo a centralized storage solution is the obvious solution. I could either go with a NAS, a SAN, or ZFS. A nas would probably be too slow, I've never used ZFS (and I'm not comfortable with Solaris), so that really only leaves a SAN solution.

So then the question is how do I create a huge partition on the SAN and make it accessible by all of the web nodes? My first thought was to setup an NFS server with a ton of storage space, and then let the nodes mount the NFS share and that would solve that.

Then the issue is, NFS mounts are usually read-only - so how do I handle actually saving data to the NFS share? Also, how much traffic can a single NFS server handle? I'd imagine NFS is a fairly lightweight service, but everything has a limit. So would it be possible to load balance multiple NFS servers that server up the same SAN space? I don't really think that would be possible, which is why I'm stuck.

So what is the most efficient and reliable way of sharing a huge centralized storage pool? I'd like to start with a min. of 2TB and then expand from there. The SAN devices I'm looking at can be expanded up to 8TB of usable hard drive space (about 12TB raw).

This is a fun project but I've really hit a wall here :(
 

DaiShan

Diamond Member
Jul 5, 2001
9,617
1
0
You should look at GFS, it's a great clustered file system which will integrate well with your SAN fabric (although I'm using it over the LAN (1 and 10 gig) ) and Red Hat cluster management software will help you to set this up quite easily, look at ricci (the agent) and Luci (the web based server) I work for a very large company and we're using this to phase out some of our incredibly expensive Veritas licenses.

/edit BTW those are good docs that you posted, keep in mind that the load director will provide both load balencing and HA as it maintains heartbeat connections to the nodes, and removes them as they fail.
 

msound

Junior Member
Nov 20, 2006
16
0
0
Yeah I was very pleased to find those docs. ;)

I remember hearing about GFS but never really looked into it. Thank you so much for your feedback, a clustered filesystem sounds perfect. I'll probably be going with CentOS 5 for the media servers so that'll give me access to all of the helpful rh5 ish.

Thanks again!
 

pravi333

Senior member
May 25, 2005
577
0
0
may be i didnt understand it correct.
Your webservers will be accessing your database server for data and there will be only one or may be two DB server depending on failover. So what is the need to mount the data on the webservers?
NFS will be your best bet.
 

msound

Junior Member
Nov 20, 2006
16
0
0
Hey pravi -

The databases will only be used to store text content - news, articles, blogs, profiles, etc.

Media - ie: images, sound files, video files will be stored on a file system - and not a sql blob. That's where all of the storage requirements come in.
 

pravi333

Senior member
May 25, 2005
577
0
0
and the media files will be stored on a server and from that server they will be NFS mounted to the web servers which will in turn serve the customers?
 

msound

Junior Member
Nov 20, 2006
16
0
0
that was the original plan yes, but now it looks like I'll be trying to do something with gfs. any thoughts?
 

acaeti

Member
Mar 7, 2006
103
0
0
You could still use NFS as read only, but have a special apache server to handle all filesystem writes that mounts the SAN as iSCSI or some other read-write mechanism. Presumably your main application activity is reading, with writing happening infrequently (by comparison) so having only a single box perform all the writes would be alright. But this is a little over my head too.
 

msound

Junior Member
Nov 20, 2006
16
0
0
well things have changed. they've decided not to store video - thank god. also the initial php code seems to be pretty incomplete. i've proposed that we move in a new direction with the initial site launch - and make some revisions to the web site's code.

soooo, i'm going to have images stored as db blobs instead of files on the filesystem. this means that all i need are two clusters - web servers and mysql database servers. this greatly cuts down on the complexity of the infrastructure and also reduces our storage requirements.

looks like it's back to the drawing board...
 

Brazen

Diamond Member
Jul 14, 2000
4,259
0
0
I think I would still keep pictures out of the db. If anything just because that is how all the web-based photo gallery applications do it (that I've seen). And using GFS instead of NFS will have much better performance.
 

DaiShan

Diamond Member
Jul 5, 2001
9,617
1
0
Originally posted by: msound
well things have changed. they've decided not to store video - thank god. also the initial php code seems to be pretty incomplete. i've proposed that we move in a new direction with the initial site launch - and make some revisions to the web site's code.

soooo, i'm going to have images stored as db blobs instead of files on the filesystem. this means that all i need are two clusters - web servers and mysql database servers. this greatly cuts down on the complexity of the infrastructure and also reduces our storage requirements.

looks like it's back to the drawing board...


I would strongly recommend against storing the images in the db, I just completed an engagement with a client storing about 100gb of images/pdf/etc in their database with absolutely dog slow performance. If you're planning on using mysql cluster, keep in mind that in it's current 5.0 form it is an in memory database only, so your data nodes will need tons of memory to accomodate the images in the db. In the case of 100gb, which is not unreasonable depending on the size of the images, to replicate once in the cluster you will need over 200GB of RAM for your data nodes! For contrast I work for a large company and our mainframe class servers ($3 million+) have 96GB of memory installed. Instead, you should store the location of the image in the database, and the actual file on the file system.
 

Kakumba

Senior member
Mar 13, 2006
610
0
0
Just to throw something else into the mix: storing as much as is reasonable on the SAN, make a single LUN available to all servers that need the data. You could make a HUGE LUN, storage space really isnt an issue for SAN generally, and also it will be RAID, so as long as you can cover other ways of data loss/ corruption, then you are on a good track.

mount the LUN read only the the serving servers (outward facing), and only mount it read-write to servers used to uploading content.. Without knowing more about what you are doing, I cant say whether this is feasible. Also, this puts all the load on a single LUN, which means that while it is great to ensure all servers have the same content, you need to make sure your IO performance is high enough...

Do you have access to a SAN administrator? Depending on your devices, and the management tools, an experienced SAN administrator is a must have.

Obviously you should have 2 or more datacentres, with the same server/ SAN configuration at each. However, you do then have the data replication issue, which I cant really make any specific recommendations about, that is not an area I have any experience in.

EDIT: In your original post, you are talking about network configuration. IS that still a question for you? As a quick few points: using servers with multiple NICs and IP Multi P athing is recommended if possible. This allows you to reference a server using a singe IP address, which resolves to either of the IP addresses of the NICs.

Also, software like PowerPath allows for I/O multipathing, so you can have High availability links to your SAN. Make sure you use good HBAs...