AWS on the road: Glacier in Durmitor

It is time for my last, long-overdue post in AWS on the road series. Short summary: almost 4 weeks on the road, 10 countries, 3700 km and 8 posts. I wrote a bit less than I planned, but it is not over yet, AWS services will appear on my blog from time to time.

Glacier

Glacier is a storage service designed to store cold data – infrequently used and accessed. It provides durable and low-cost storage for data archiving and backup. This REST based service can be used to store data for really long time.

If you can remember from my previous post about S3, I mentioned that you can set Glacier lifecycle on a bucket. Underneath, S3 stores objects in Glacier but you can’t access them in Glacier service and they are only visible in your S3 buckets.

To use Glacier we need to understand basic terminology:

  • Vault – it is a container for storing archives. You can store an unlimited number of archives in a vault. Depending on your business or application needs, you can store these archives in one vault or multiple vaults.
  • Archive – this is am actual file that you want to store, can be any data such as a photo, video, or document and is a base unit of storage in Glacier. Each archive has a unique ID and can have a description. You can only specify the description during the upload of an archive.

Each archive has an endpoint that is used to access it:

https://<region-specific endpoint>/<account-id>/vaults/<vault-name>/archives/<archive-id>

Data operations

Amazon Management Console allows only to create vaults. To interact with data you have to use CLI or write piece of code. Some of the operations are asynchronous, like retrieving an archive or vault inventory (list of archives). These operations require you to first initiate a job and then download the job output. To do that you have to perform a query and you will get job ID as a response. Then, you can check the status of each job and when is ready you can get the results.

Types of jobs:

  • Select – performs a select query on an archive. That is very useful as you can run SQL queries on your data without having to restore it.
  • Archive-retrieval – downloads an archive.
  • Inventory-retrieval – gets list of archives in vault.

Pricing

Now comes more tricky part – costs calculation. Glacier is dedicated for archiving and backup so besides paying for stored amount of data you also pay for data retrieval. As part of the AWS Free Usage Tier, you can retrieve up to 10 GB per month for free (Standard retrieval). During each Archive-retrieval job you can specify tier:

  • Expedited – allows you to quickly access your data when occasional urgent requests for a subset of archives are required. For archives smaller than 250MB retrievals are typically made available within 1–5 minutes. There are two types of Expedited retrievals: On-Demand and Provisioned. Provisioned requests are guaranteed to be available when you need them whereas On-Demand retrievals can be rejected in rare situations of unusually high demand.
  • Standard – Standard retrievals allow you to access any of your archives within several hours. Standard retrievals typically complete within 3–5 hours. This is the default option for retrieval requests.
  • Bulk – Bulk retrieval is the lowest-cost retrieval option, which you can use to retrieve large amounts, even petabytes, of data. Bulk retrievals typically complete within 5–12 hours.

Moreover, you have to pay for retrieval request and data transfer to the Internet if you want to download them outside of AWS Region.

Glacier archives have a minimum 90 days of storage, and archives deleted before 90 days are counted for full 90 days.

Next what you can set is data retrieval policy. It is configured on vault and applies to standard retrievals only, managed from Glacier, so S3 is not included. It allows you to control how much data can be retrieved:

Glacier retrieval policies

Summarizing, when using Glacier you will pay for data storage, retrievals (where price depends how fast you need the data) and requests made to the service. Amount of retrieved data can be controlled by the policy. Don’t get me wrong, Glacier is really cheap storage option compering to S3, for example in Ireland Region it currently costs $0.004 per GB/Month whereas S3 Standard costs $0.021 per GB/Month, and best suits backup and archive use cases.

Reference Materials

AWS on the road
Amazon Glacier documentation
Amazon Glacier pricing

Durmitor, Montenegro

Leave a Reply