It’s easy when you know the answer — finding an S3 bucket when you don’t know the name

Anthony Angel
Engineering at FundApps
3 min readAug 3, 2022

--

Our clients at FundApps are some of the worlds largest financial institutions, who collectively manage in excess of $14 trillion. One of the guardrails which we have implemented is a shared-nothing approach, where each tenant has their own infrastructure residing in their own dedicated AWS account.

One of the notable limitations of Amazon S3 is globally unique bucket names. Although a bucket is created in a specific region, and specific to an AWS account (it can of course be shared cross-account), the name of the bucket has to be unique not only to the AWS account in question, but to all AWS accounts, i.e if I create a bucket called thisismybucket in 1 account, then no-one else can re-use that name in any other account.

Based on the above we needed to find a way to create a copy of the infrastructure in each AWS account, whilst keeping the required uniqueness for bucket names. The Terraform S3 resource does have an input value for bucket_prefix , which will create a unique bucket name by appending a random suffix, i.e. we can create a bucket named metadata20210625152621311800000002 from the following code

resource "aws_s3_bucket" "bucket" {
bucket_prefix = "metadata"
}

DevOps ❤ pets (not that kind!)

This isn’t particularly pleasant to look at, so instead we use the Terraform random pet provider to give more human-readable names.

resource “random_pet” “this” {
length = 2
}
resource "aws_s3_bucket" "bucket" {
bucket = "metadata-${random_pet.this.id}"
}

The resulting names are far more readable, and memorable when developing or troubleshooting. Some examples include:

  • metadata-social-basilisk
  • metadata-eminent-reptile
  • metadata-wired-lobster

What could possibly go wrong?

One of our values at FundApps is #BeTransparent, there’s an assumption within our engineering team that at some point we will write the bucket name into an incident report, and then we will have to explain to a client that the names are randomly generated and not meant as a personal slight!

Why is this a problem?

If we only needed to reference these buckets in the same block of Terraform which created them, life would be simple, as we’d just reference the Terraform resource identifier aws_s3_bucket.bucket.arn , but we need to reference these buckets from other places within Terraform.

A common example is having our logging buckets set up in 1 Terraform workspace, and separate workspaces for each service. This keeps the Terraform nice and tidy, whilst allowing a separation of concerns between our product teams. In order to set up an S3 bucket to log to a different bucket we need to know the name of the log bucket.

# Pseudo-code, this block is meant to be within an aws_s3_bucket_logging resourcelogging {
target_bucket = aws_s3_bucket.logs.id # This is the bucket name
target_prefix = "log/"
}

We need to know the name of the bucket in order to reference it.

# This won't work unless you know the full name of what you are looking fordata "aws_s3_bucket" "logs" {
bucket = "logs"
}

SSM to the rescue

One approach would be to write the S3 bucket name to an SSM parameter, and have Terraform perform a data lookup on the SSM parameter. This does work, but requires an overhead on creating the SSM parameter next to the bucket.

data "aws_ssm_parameter" "logs_bucket" {
name = "/s3_buckets/logs"
}
logging {
target_bucket = data.aws_ssm_parameter.logs_bucket.value
target_prefix = "log/"
}

A nicer approach

An alternative approach using the AWS Cloud Control API, and more specifically the Terraform provider for it.

terraform {
required_providers {
awscc = {
source = "hashicorp/awscc"
}
}
}
provider "awscc" { region = "eu-west-1" }data "awscc_s3_buckets" "this" {}# This will output a list of all my S3 buckets
output "buckets" {
value = data.awscc_s3_buckets.this
}
locals {
bucket_list = data.awscc_s3_buckets.this.ids
matching_string = "logs"
filtered_bucket = [for bucket in local.bucket_list : bucket if substr("${bucket}", 0, length("${local.matching_string}")) == "${local.matching_string}"]
}
output "filtered" {
value = local.filtered_bucket
}

This allows us to not have to create any additional resources next to the S3 buckets, we have a single data lookup to return all bucket names back to us. We can keep our Terraform as DRY as possible, resulting in small & atomic Terraform workspaces, whilst still keeping things unique.

--

--