It’s easy when you know the answer — finding an S3 bucket when you don’t know the name
Our clients at FundApps are some of the worlds largest financial institutions, who collectively manage in excess of $14 trillion. One of the guardrails which we have implemented is a shared-nothing approach, where each tenant has their own infrastructure residing in their own dedicated AWS account.
One of the notable limitations of Amazon S3 is globally unique bucket names. Although a bucket is created in a specific region, and specific to an AWS account (it can of course be shared cross-account), the name of the bucket has to be unique not only to the AWS account in question, but to all AWS accounts, i.e if I create a bucket called thisismybucket
in 1 account, then no-one else can re-use that name in any other account.
Based on the above we needed to find a way to create a copy of the infrastructure in each AWS account, whilst keeping the required uniqueness for bucket names. The Terraform S3 resource does have an input value for bucket_prefix
, which will create a unique bucket name by appending a random suffix, i.e. we can create a bucket named metadata20210625152621311800000002
from the following code
resource "aws_s3_bucket" "bucket" {
bucket_prefix = "metadata"
}
DevOps ❤ pets (not that kind!)
This isn’t particularly pleasant to look at, so instead we use the Terraform random pet provider to give more human-readable names.
resource “random_pet” “this” {
length = 2
}resource "aws_s3_bucket" "bucket" {
bucket = "metadata-${random_pet.this.id}"
}
The resulting names are far more readable, and memorable when developing or troubleshooting. Some examples include:
- metadata-social-basilisk
- metadata-eminent-reptile
- metadata-wired-lobster
What could possibly go wrong?
One of our values at FundApps is #BeTransparent, there’s an assumption within our engineering team that at some point we will write the bucket name into an incident report, and then we will have to explain to a client that the names are randomly generated and not meant as a personal slight!
Why is this a problem?
If we only needed to reference these buckets in the same block of Terraform which created them, life would be simple, as we’d just reference the Terraform resource identifier aws_s3_bucket.bucket.arn
, but we need to reference these buckets from other places within Terraform.
A common example is having our logging buckets set up in 1 Terraform workspace, and separate workspaces for each service. This keeps the Terraform nice and tidy, whilst allowing a separation of concerns between our product teams. In order to set up an S3 bucket to log to a different bucket we need to know the name of the log bucket.
# Pseudo-code, this block is meant to be within an aws_s3_bucket_logging resourcelogging {
target_bucket = aws_s3_bucket.logs.id # This is the bucket name
target_prefix = "log/"
}
We need to know the name of the bucket in order to reference it.
# This won't work unless you know the full name of what you are looking fordata "aws_s3_bucket" "logs" {
bucket = "logs"
}
SSM to the rescue
One approach would be to write the S3 bucket name to an SSM parameter, and have Terraform perform a data lookup on the SSM parameter. This does work, but requires an overhead on creating the SSM parameter next to the bucket.
data "aws_ssm_parameter" "logs_bucket" {
name = "/s3_buckets/logs"
} logging {
target_bucket = data.aws_ssm_parameter.logs_bucket.value
target_prefix = "log/"
}
A nicer approach
An alternative approach using the AWS Cloud Control API, and more specifically the Terraform provider for it.
terraform {
required_providers {
awscc = {
source = "hashicorp/awscc"
}
}
}provider "awscc" { region = "eu-west-1" }data "awscc_s3_buckets" "this" {}# This will output a list of all my S3 buckets
output "buckets" {
value = data.awscc_s3_buckets.this
}locals {
bucket_list = data.awscc_s3_buckets.this.ids
matching_string = "logs"
filtered_bucket = [for bucket in local.bucket_list : bucket if substr("${bucket}", 0, length("${local.matching_string}")) == "${local.matching_string}"]
}output "filtered" {
value = local.filtered_bucket
}
This allows us to not have to create any additional resources next to the S3 buckets, we have a single data lookup to return all bucket names back to us. We can keep our Terraform as DRY as possible, resulting in small & atomic Terraform workspaces, whilst still keeping things unique.