Here’s the scenario: You’ve created a private ECR repository, you’ve uploaded an image to it and now you want to run that image as an ECS task. But… you don’t want ECS going out over the public internet to the ECR API. Instead you want to keep the traffic inside your VPC.
AWS have written docs on how to access ECR using VPC endpoints but, having followed them as closely as I could I still couldn’t get ECS to be able to pull the image using this approach. ECS was failing with the following error:
CannotPullContainerError: ref pull has been retried 5 time(s): failed to copy: httpReadSeeker: failed open: failed to do request: Get “https://prod-eu-west-2-starport-layer-bucket.s3.eu-west-2.amazonaws.com/
The TL;DR of the AWS guidance (or at least my understanding of it) is that you must:
- Identify all subnets which will host your ECS tasks that need access to ECR. Make a note of their route tables.
- Create a gateway S3 endpoint, rather than an interface one. Select the route tables identified in the previous step to ensure that they each receive a route to the S3 gateway endpoint.
- Create interface VPC endpoints for the following services:
com.amazonaws.region.ecr.api
com.amazonaws.region.ecr.dkr
- Create the necessary security group rules (both ingress and egress) to allow TCP traffic on port 443 (TLS) to both of the above ECR interface VPC endpoints, from the ECS task.
- Ensure that the ECS Task execution IAM role has the necessary permissions to pull images from ECR.
Perhaps it was due to this being the first time I had used gateway endpoints, but the crucial thing that I failed to glean from the AWS docs was that it is also necessary to set up security group rules which permit access to the S3 gateway endpoint. This is actually really easy to do as AWS provide the CIDR ranges in an S3 prefix list, which gives us a nice concise way of providing access to S3, without having to implement any super wide security group rules.
What does it look like?
I’ll run through some rough Terraform code for this whole thing below. I’m making the assumption that you already have the following resources in place:
- A VPC
- interface VPC endpoints for the
com.amazonaws.region.ecr.api
andcom.amazonaws.region.ecr.dkr
services - Some private subnets, within the VPC, in which you will run your ECS service
- A container image which you will upload to your ECR repository
Create a private ECR repository
resource "aws_ecr_repository" "repository" {
name = "Edd's ECR repository"
}
Create an ECR repository policy
Permit access to the ECR repository from the necessary IAM principals. This example is extremely permissive and gives access from any IAM principle within the same AWS account as the ECR repository. You might want to lock this down further as your own needs dictate.
data "aws_caller_identity" "current_user" {}
resource "aws_ecr_repository_policy" "permit_access_from_within_aws_account" {
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "PermitFullAccountWideAccessToECR",
Effect = "Allow",
Principal = { "AWS" : "arn:aws:iam::${data.aws_caller_identity.current_user.account_id}:root" },
Action : [
"ecr:*"
]
}
]
})
repository = aws_ecr_repository.repository.name
}
Upload an image in to the ECR repository.
AWS provide docs on how to do this here.
Create the S3 Gateway VPC Endpoint
vpc_id
must identify the VPC in which your ECS service will runsubnet_ids
must identify the subnets in which the ECS service will runroute_table_ids
must be the corresponding route tables for the subnets identified insubnet_ids
resource "aws_vpc_endpoint" "s3_gateway_vpce" {
vpc_id = var.vpc_id
service_name = "com.amazonaws.${var.region_name}.s3"
vpc_endpoint_type = "Gateway"
subnet_ids = var.subnet_ids
route_table_ids = var.route_table_ids
tags = {
Name = ${var.vpc_id}-${var.region_name}-s3-gateway-endpoint
}
}
Create a Security Group
This security group will ultimately be associated with the ECS service. As such its rules will control what the service can and cannot access, within the VPC.
resource "aws_security_group" "ecs_service" {
name_prefix = "${var.service_name}-ecs-service"
vpc_id = var.vpc_id
tags = {
Name = "${var.service_name}-ecs-service"
}
lifecycle {
create_before_destroy = true
}
}
Look up the S3 prefix list
AWS publishes the CIDR blocks of the S3 Gateway endpoint, so we look them up here.
data "aws_prefix_list" "s3" {
prefix_list_id = aws_vpc_endpoint.s3_gateway_vpce.prefix_list_id
}
Permit access to the S3 Gateway Endpoint
Create a Security Group Rule which permits the ECS service’s security group to perform TLS egress to the S3 gateway endpoints. This is the key piece of configuration I was missing. Without this ECS cannot pull ECR container images over the ECR VPC Endpoint.
resource "aws_security_group_rule" "permit_tls_egress_to_s3_gateway_endpoints" {
description = "Permit TLS egress from ${aws_security_group.ecs_service.name} security group to S3 VPC Gateway Endpoint IP addresses."
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.ecs_service.id
to_port = 443
type = "egress"
cidr_blocks = data.aws_prefix_list.s3.cidr_blocks
}
Create an ECS Service
The ECS Service’s network configuration must associate it with the Security Group created in the step above.
resource "aws_ecs_service" "ecs_service" {
// I've omitted most of the configuration here for brevity.
// The key thing is that the ECS service is associated with the Security Group
// which was created in the step above.
network_configuration {
subnets = var.subnet_ids
security_groups = [aws_security_group.ecs_service.id]
}
}
And voila! Your ECS service should be able to pull your ECR image, from your private ECR repository, without any network traffic having to traverse the public internet.
I hope that sharing this will help others avoid falling in to the same trap as I did. Let me know in the comments below if this helps you get your configuration working.
Until next time!
Edd