Amazon Glacier Client

Tsaar

Guest
Apr 15, 2010
228
0
76
I have been using an incremental cloud backup for years, and have been looking for alternatives. I recently discovered Amazon Glacier. I have a EE background, so my programming skills are not that strong and more related to embedded systems (I also can knock out quick scripts using Python if I need to).

If anyone here has programmed an S3 frontend or similar, how big of an undertaking would it be to write a client that encrypts my data locally and upload using the RESTful APIs? After this I would need a way to throttle downloads due to Amazon's download pricing.

I am paranoid to use 3rd party apps and thought this may be a good experience to improve my programming skills.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
There are open source 3rd party apps that do exactly what you want. If you're paranoid, read the source code, and build it yourself from scratch.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
After this I would need a way to throttle downloads due to Amazon's download pricing.

Unless you give others access to the files using your private keys, nothing will ever be downloaded.

Also, Glacier is offline storage, not like S3 buckets. To copy files back from Glacier you have to wait up to several hours for files to be copied back into an S3 bucket.
 

Tsaar

Guest
Apr 15, 2010
228
0
76
Unless you give others access to the files using your private keys, nothing will ever be downloaded.

Also, Glacier is offline storage, not like S3 buckets. To copy files back from Glacier you have to wait up to several hours for files to be copied back into an S3 bucket.

Yes I know. This is for catastrophic emergency storage.

This article explains what I meant by the throttling:

http://www.wired.com/wiredenterprise/2012/08/glacier/
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
So you're talking about throttling the download if you never need to restore? It looks like you can access 5% of your glacier store for free every day, so you would want to throttle a restore over 20 days? I don't see the point. If you have an emergency, you'll probably be willing to pay whatever the cost is for same-day retrieval of all your data.
 

Tsaar

Guest
Apr 15, 2010
228
0
76
So you're talking about throttling the download if you never need to restore? It looks like you can access 5% of your glacier store for free every day, so you would want to throttle a restore over 20 days? I don't see the point. If you have an emergency, you'll probably be willing to pay whatever the cost is for same-day retrieval of all your data.

Even in a catastrophic situation it would be doubtful that I would need more than 5% due to the fact that most of my data is unchanging and already backed up locally and offsite.

The point of throttling is because the 5% needs to be spread throughout the day though. I believe it was that article where I read that ~800mb were charged as ~120gb due to the prorated nature of the billing. That 5% has to spread over 24 hours to be free, if you do 5% in 1 minute it will multiply your download size by 1440 times (60*24).
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Yes if you need to restore after a catastrophe it looks like you do need to do it carefully, and the most import part seems to be to request archive restoration in small chunks not all at once.

Still, if your house burns down then a $500 charge to get back all of your data is probably the least of your worries, and isn't necessarily worth spending days of work to avoid.

On the third hand, working out ways to optimize bulk retrieval would be a great feature for you to contribute back to one or more of the existing glacier backup projects. Even if you're not up to coding it you might be able to do the math and pass it on to someone else who can. Maybe sketch out the design of how it should work, then make that a feature request.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
Even in a catastrophic situation it would be doubtful that I would need more than 5% due to the fact that most of my data is unchanging and already backed up locally and offsite.

The point of throttling is because the 5% needs to be spread throughout the day though. I believe it was that article where I read that ~800mb were charged as ~120gb due to the prorated nature of the billing. That 5% has to spread over 24 hours to be free, if you do 5% in 1 minute it will multiply your download size by 1440 times (60*24).

I think your calculation is quite a bit off.

It takes something like 4 hours to retrieve data once you've requested it. So downloading 800mb actually takes 4 hours, not 1 minute. Going with that assumption, you need to multiply by 6 (24 hours / 4 hours), not 1440. So 800mb, would actually be treated as 4.8GB, which is affordable.

(I haven't read any documentation, so I may be totally wrong)
 

Tsaar

Guest
Apr 15, 2010
228
0
76
I think your calculation is quite a bit off.

It takes something like 4 hours to retrieve data once you've requested it. So downloading 800mb actually takes 4 hours, not 1 minute. Going with that assumption, you need to multiply by 6 (24 hours / 4 hours), not 1440. So 800mb, would actually be treated as 4.8GB, which is affordable.

(I haven't read any documentation, so I may be totally wrong)

My example was is if it pro-rated at the minute level, I am not sure the exact details, but this is from that article (that 3-5 hour wait after a request is not included in the bandwidth charge):

"For example, a 3 terabyte archive that can’t be split into smaller chunks could lead to a retrieval fee as high as $22,082 if the peak usage is determined to be 3 terabytes per hour. The cost of requests is separate from the cost of bandwidth to download the data, which has its own separate pricing table."

Edit: If you do a search for most Glacier clients they specifically cite the ability to throttle a download due to Amazon's crazy policy. The 3-5 hour delay is to deter someone to use Glacier as a regularly accessed storage location. The bandwidth thing is probably for the same reason, but it actually seems to penalize the user too greatly. I feel that a 3-5 hour delay is probably enough of a deterrence. At least have it where you can request 5% a day for free no matter how fast it downloads.

I think the actual equation is based on hours per month, not minutes per day. For example, one of the commenters has a blog from that article where he shows a screen shot of downloading 1gb out of 24gb. He gets charged for downloading 167gb.
 
Last edited:

Tsaar

Guest
Apr 15, 2010
228
0
76
Basically ignore everything I said about throttling. To minimize cost you need to minimize the amount of data you REQUEST to retrieve each 4 hours according to the Amazon rep:

"Only the retrieval time is factored in, the download time is never considered. The retrieval time is that time that it takes the service to make your data available for download. A simple way to spread the 30GB retrieval over 12 hours would be to retrieve 10GB in hour 0, 10GB in hour 4, and 10GB in hour 8."