• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Building a website with an image uploader - multiple servers

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Hi Guys,

Working on an app which has a simple photo uploader, and I'm trying to figure out the best way to approach having multiple photo servers.

My app is running on Windows 2008, and to start I'll have another server setup just for storing the files.

www01
images01

images01 will basically just be a file server, and I'm going to have a directory that is shared to www01.

I have a few ideas on how to handle this. So far, this one is winning.

From within my webapp, have a folder called 'imagehost'. Within this folder, I would create another folder called images01 which would actually be a mapped drive to the images01 server. This would make the files appear to be hosted from within the website's directory

aka company.com/imagehost/images01/dir/dir/dir

This way, if I need to add another server, it's going to be really just adding another folder to the main imagehost folder. It would be really easy to manage.

company.com/imagehost/images01/dir/dir/dir
company.com/imagehost/images02/dir/dir/dir
company.com/imagehost/images03/dir/dir/dir

Any downsides to this approach?
 

Doublejr

Senior member
Jul 25, 2004
205
0
0
Are the multiple servers going to be image mirrors, for bandwidth issues? Or more Storage?

Anyway I would use subdomains then you wouldn't need to map anything.

I.E.

image01.company.com/images/
image02.company.com/images/

or something like that and just have some code to round robin or load balance the image servers if that is what you are looking to do.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Are the multiple servers going to be image mirrors, for bandwidth issues? Or more Storage?

Anyway I would use subdomains then you wouldn't need to map anything.

I.E.

image01.company.com/images/
image02.company.com/images/

or something like that and just have some code to round robin or load balance the image servers if that is what you are looking to do.

They would be for more storage. Once one of the servers was tapped out, (either from storage, cpu, ram, etc) I'd like to be able to add another one into the mix.

I would still need to map to the servers either way, as the images are uploaded to the website. They'd come in through the website to the image server. So whether I map as a folder, or as an actual drive, I still need to have that share connected.
 

Doublejr

Senior member
Jul 25, 2004
205
0
0
Ahh I forgot you are working with windows servers. I primarily work on linux machines so mapping is not in my vocabulary :p

How many images are you talking about? A server can have a lot of storage especially if you raid a bunch of drives or connect to a san.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Is server load or bandwidth an issue?

With your approach, to serve one file you use LAN bandwidth to transfer the file to the front-end server, then that single server sends the file so its maximum bandwidth could be a choke point.

With subdomains no LAN transfer is needed, and each server sends its own files.

Another approach would be to just use the main server as a lookup point, where the img link is something like /my-image-lookup?id=12345 and then the server returns a 302 MOVED response pointing the right image server. This means 2 request / response messages per image, but no LAN transfer and much less time and bandwidth used by the main server.


A completely different approach would be using Amazon cloud storage, then you get rid of both limits, but the cost might (or might not) be higher than running your own server farm.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Ahh I forgot you are working with windows servers. I primarily work on linux machines so mapping is not in my vocabulary :p

How many images are you talking about? A server can have a lot of storage especially if you raid a bunch of drives or connect to a san.

Could be a great many images. I'm wanting to plan for scale.

I will certainly make scaling each image server up the priority. I don't want to have a 100 image servers when I could have 1 or 2, but as Dave pointed out, there could be bottlenecks in other areas, such as lan performance.

I've considered opening up the image servers to the web, and when requests are made to the image server, it serves them directly. However, the webserver is where users would upload them to. they'd come in through the webserver, across the lan to the image server.

I work in a datacenter, so I have access to a lot of resources. Much cheaper than amazon.

One of my clients, AirVM.com, is actually going to provide the virtual environment. So all of the servers will be vm's attached to high speed storage (15k sas drives on Netapps). The LAN connection between the webservers to the image servers would be logical in most cases. As it grows, the vm's would be spread out across multiple physical hosts (in a clustered vmware environment with vmotion, etc) so it's possible physical LAN would come into play, but it'd be all 10GiGE or GigE.
 

dwell

pics?
Oct 9, 1999
5,185
2
0
You may want to take into account hotspots. If you do the incremental approach (images01, images02, images03) as your site grows, the load won't be distributed evenly. I would imagine the newest server would have all of the write traffic and the other servers would be getting light reads compared to the current server because that's where all the recently uploaded stuff is.

You can come up with some kind of hashing scheme, where an image file name hashes to a specific path, /images01 ... /images99 for example. You can start with one server and have the paths map to one directory on one server, then as you go introduce more servers you spread the mappings across them. Simply sharing based on filename probably wouldn't work if you something like a trie because common file prefixes would be heavy (like image_ and DCIM).

The downside is that you would have to move all the images that hash to a new server as you add them.

Hope this makes sense.
 

TechBoyJK

Lifer
Oct 17, 2002
16,699
60
91
Good points. I was actually thinking that I would just do a round robin approach once I had 2 or more servers in play. Maybe alternate them on a day to day basis so the 'current' photos were evently distributed. The application has a global variable called #imagehost# that determines what server is actively taking uploads. So I can easily change which server is taking uploads.
 

pdubs10

Junior Member
Apr 2, 2012
2
0
0
I'd consider ditching filesystems and folders and the like and going with a NoSQL-style solution. Couchbase does the whole clustering and auto-sharding thing automatically so growth and scalability are really easy to handle.

This way you can just have a url like company.com/imagehost/AAJ89JD672 (or whatever you want for the key) and have it spit out the image data. You neither know nor care what server is being written to or read from.

A nice side benefit of this is that the most requested images will be cached in memory, reducing disk I/O.
 

dwell

pics?
Oct 9, 1999
5,185
2
0
That's a polarizing topic. While NoSQL makes it possible to store binary files way more efficiently than SQL blobs you're still going to slam the I/O on your storage server. IMO it's best to let the filesystem do what does best -- files.

You can always throw a CDN in front of everything to reduce the number of hits on your system.