Optimizing Multi-Architecture Container Image Builds on AWS

Learn how to optimize multi-architecture container images using CodeBuild, Docker Buildx, and Amazon ECR. I focus on two powerful techniques: structuring Dockerfiles to leverage build caching effectively, and enabling ECR layer caching with Buildx.

Building multi-architecture container images (such as for linux/amd64 and linux/arm64) can be quite challenging. If you've ever experienced several-minute build times and docker pull delays, you know exactly what I mean. In this post, I'll walk you through two powerful techniques that have significantly optimized my pipeline. As part of my job at DoiT, I've helped to develop an internal Cost Optimization tool, and I needed to create an automated container image build pipeline using GitHub for source code, CodePipeline/CodeBuild for continuous integration, Docker Buildx for multi-architecture builds, and Amazon ECR (Elastic Container Registry) as the container registry.

I'll be focusing on:

  1. Dockerfile Structure Optimization
  2. ECR Layer Caching with Buildx

(Note: While CodeBuild provides additional caching capabilities, in the context of building container images with Buildx, this option doesn't offer benefits)

Sample Application Code

To put everything into context, here's a very simplified version of the application code I had to build. This sample application is a basic Streamlit app that displays a title, version, and a sample table with random data. The actual code I had to package was a Streamlit application with multiple libraries and more complex Python code. However, the complexity of the code does not affect what I will explain below. These are the only code files we need to run the application.

requirements.txt

streamlit

streamlit.json

{
    "version": "v1.0.0"
}

streamlit.py

import streamlit as st
import json
import pandas as pd
import numpy as np

# Load version from streamlit.json
with open("streamlit.json", "r") as f:
    config = json.load(f)
    version = config.get("version", "Unknown")
    # Display the app title and version
    st.title("Demo Streamlit App")
    st.write(f"**Version:** {version}")
    # Generate a sample table with 3 columns and 5 rows with random values
    data = np.random.rand(5, 3)
    columns = ["Column 1", "Column 2", "Column 3"]
    df = pd.DataFrame(data, columns=columns)
    # Display the table in the app
    st.table(df)

When you're building container images, two primary challenges often emerge:

Slow Image Downloads: When Dockerfiles are not optimized, downloads for some layers can take minutes. In contrast, an optimized Dockerfile structure can reduce this to a few seconds for the modified layers.

Extended Build Times: Build times can extend to several minutes without proper caching. However, building times drop dramatically when ECR layer caching is leveraged with Buildx.

These differences are not just numbers; they directly impact the overall productivity of your CI/CD pipeline and the speed at which you can iterate on your applications.

Technique 1: Dockerfile Structure Optimization

While this is a well-known recommendation, optimizing your Dockerfile structure is an elegant and surprisingly effective way to reduce build and pull times. The idea is to ensure that static content, such as dependency files, is handled first so that any changes in the frequently updated application code don't force a complete rebuild of every layer. Here's what the difference looks like.

Leverage build cache in the Dockerfile

In this version, the entire application folder is copied before installing dependencies. This means that even a small change in your source code can invalidate the cached layers, leading to a fresh re-installation of your dependencies every time you build. The result? Docker pull operations take a long time because locally pulled layers can not be reused.

Non-Optimized Dockerfile:

FROM public.ecr.aws/docker/library/python:3.12.2-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    netbase \
 && rm -rf /var/lib/apt/lists/*
COPY ./app/ app/
RUN pip3 install -r app/requirements.txt
EXPOSE 8501
WORKDIR /app
ENTRYPOINT ["streamlit", "run", "streamlit.py", "--server.port=8501"]

In the optimized Dockerfile, notice how we first copy the requirements.txt file and install the dependencies before copying the rest of the application code. This simple change means that if you update your application logic (e.g., streamlit.py), Docker can reuse the previously cached pip installation layer. The benefit is tremendous: Docker pull times drop significantly.

Optimized Dockerfile:

FROM public.ecr.aws/docker/library/python:3.12.2-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    netbase \
 && rm -rf /var/lib/apt/lists/*
# Install pip requirements first to improve caching
COPY ./app/requirements.txt app/
RUN pip3 install -r app/requirements.txt
COPY ./app/ app/
EXPOSE 8501
WORKDIR /app
ENTRYPOINT ["streamlit", "run", "streamlit.py", "--server.port=8501"]

This approach is not only effective but also extremely easy to implement. It's amazing how a minor tweak can yield such significant performance improvements.

As an extra tip, I prefer to download Buildx using the AWS ECR public gallery to avoid Docker Hub download rate limits. In the Buildspec files, I use:

public.ecr.aws/vend/moby/buildkit:buildx-stable-1

This ensures more reliable build processes.

Note: I could use a custom base image and avoid the second RUN command in the Dockerfile, but I wanted to focus on the improvement techniques instead of creating a production-grade Dockerfile.

Technique 2: ECR Layer Caching with Buildx

Let's discuss another game-changing technique: using ECR layer caching with Docker Buildx. Traditional builds that do not use caching might take several minutes, but with ECR caching, you can see build times drop to just about 1 minute. How does this work?

With Docker Buildx, you can use the --cache-from and --cache-to flags to manage a remote cache stored in Amazon ECR. Here's why that matters:

Persistent Cache: Layers built in one pipeline run can be stored and reused across subsequent builds. This means that even if you're building for different architectures, if a layer hasn't changed, it won't be rebuilt.

Cross-Platform Efficiency: Because the cache is hosted remotely in ECR, the same cache can serve different CodeBuild environments, ensuring that your multi-architecture builds are as efficient as possible.

This technique doesn't just speed up your builds—it transforms the entire development experience by dramatically reducing downtime and build costs.

Buildspec.yml for ECR Layer Caching

Below are two versions of a buildspec.yml file. Notice how the optimized version leverages ECR layer caching to improve build performance (look for the docker buildx build command in the build phase).

Non-Optimized buildspec.yml:

version: 0.2

env:
  variables:
    IMAGE_TAG: latest
    ARCH: amd64
    BUILDKIT_REPO: public.ecr.aws/vend/moby/buildkit:buildx-stable-1

phases:
  install:
    commands:
      - echo "IMAGE_REPO_NAME is $IMAGE_REPO_NAME"
      - echo "IMAGE_TAG is $IMAGE_TAG"
      - echo "AWS_ACCOUNT_ID is $AWS_ACCOUNT_ID"
      - echo "AWS_DEFAULT_REGION is $AWS_DEFAULT_REGION"
      # Installing Buildx
      - BUILDX_URL=$(curl -s https://raw.githubusercontent.com/docker/actions-toolkit/main/.github/buildx-lab-releases.json | jq -r ".latest.assets[] | select(endswith(\"linux-$ARCH\"))")
      - echo "BUILDX_URL $BUILDX_URL"
      - mkdir -vp ~/.docker/cli-plugins/
      - curl --silent -L --output ~/.docker/cli-plugins/docker-buildx $BUILDX_URL
      - chmod a+x ~/.docker/cli-plugins/docker-buildx
  pre_build:
    commands:
      - echo "Logging in to Amazon ECR..."
      - aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
      - docker pull $BUILDKIT_REPO
      - docker buildx create --use --name multiarch
  build:
    commands:
      - echo "Build started on `date`"
      - docker buildx build --push --platform=linux/amd64,linux/arm64 --tag $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG -f Dockerfile-multi-stage-optimization .
  post_build:
    commands:
      - echo "Build and push complete on `date`"

Optimized buildspec.yml with ECR Layer Caching:

version: 0.2

env:
  variables:
    IMAGE_TAG: latest
    ARCH: amd64
    BUILDKIT_REPO: public.ecr.aws/vend/moby/buildkit:buildx-stable-1

phases:
  install:
    commands:
      - echo "IMAGE_REPO_NAME is $IMAGE_REPO_NAME"
      - echo "IMAGE_TAG is $IMAGE_TAG"
      - echo "AWS_ACCOUNT_ID is $AWS_ACCOUNT_ID"
      - echo "AWS_DEFAULT_REGION is $AWS_DEFAULT_REGION"
      # Installing Buildx
      - BUILDX_URL=$(curl -s https://raw.githubusercontent.com/docker/actions-toolkit/main/.github/buildx-lab-releases.json | jq -r ".latest.assets[] | select(endswith(\"linux-$ARCH\"))")
      - echo "BUILDX_URL $BUILDX_URL"
      - mkdir -vp ~/.docker/cli-plugins/
      - curl --silent -L --output ~/.docker/cli-plugins/docker-buildx $BUILDX_URL
      - chmod a+x ~/.docker/cli-plugins/docker-buildx
  pre_build:
    commands:
      - echo "Logging in to Amazon ECR..."
      - aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
      - docker pull $BUILDKIT_REPO
      - docker buildx create --use --name multiarch
  build:
    commands:
      - echo "Build started on `date`"
      - docker buildx build --push --platform=linux/amd64,linux/arm64 \
          --cache-from type=registry,ref=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:cache \
          --cache-to mode=max,image-manifest=true,oci-mediatypes=true,type=registry,ref=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:cache \
          --tag $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG \
          -f Dockerfile-multi-stage-optimization .
  post_build:
    commands:
      - echo "Build and push complete on `date`"

By incorporating these caching mechanisms, the build time decreases dramatically. This radical improvement enables you to iterate faster and get your applications into production with significantly less delay.

Real-World Build and Pull Time Observations

I decided to test both the non-optimized and optimized approaches under three distinct scenarios to see how much of a difference the Dockerfile structure and container layer caching could make. Specifically, I looked at:

A first run from scratch (where nothing is cached yet)

A change in the requirements.txt file (which forces a rebuild of dependencies and invalidates subsequent layer caches)

A change in the application code (which ideally should only invalidate the final layer)

Below are the times I measured for each scenario. The non-optimized pipeline used a straightforward Dockerfile (where dependencies and app code are copied together). It did not use ECR layer caching in the build phase. In contrast, the optimized pipeline used a Dockerfile that installs dependencies first and leveraged ECR layer caching more effectively in the build phase.

Non-Optimized Pipeline:

  • First run:
    • Build time: 4 minutes 56 seconds
    • Pull time: 1 minute 35 seconds
  • Change in requirements.txt:
    • Build time: 4 minutes 55 seconds
    • Pull time: 1 minute 44 seconds
  • Change in application code:
    • Build time: 4 minutes 50 seconds
    • Pull time: 1 minute 41 seconds

Optimized Pipeline:

  • First run:
    • Build time: 4 minutes 47 seconds
    • Pull time: 1 minute 48 seconds
  • Change in requirements.txt:
    • Build time: 4 minutes 32 seconds
    • Pull time: 1 minute 46 seconds
  • Change in application code:
    • Build time: 49 seconds
    • Pull time: < 1 second

As expected, both pipelines had a similar build and pull times after the first run from scratch and after a change in the requirements.txt file. The real difference came when I changed only the application code. In that case, the optimized pipeline was rebuilt in under a minute and pulled almost instantly, while the non-optimized approach took several minutes. This demonstrates the true power of an optimized Dockerfile and effective layer caching, especially when making small, frequent code changes.

Final Thoughts

By carefully reordering the steps in your Dockerfile and leveraging the robust layer caching provided by ECR and Docker Buildx, you can see dramatic improvements in both download and build times. These techniques, when combined, allow for a much smoother and faster CI/CD process for multi-architecture builds. Imagine reducing pull times from 1–2 minutes to under a second and build times from 5 minutes to just 1 minute—this kind of efficiency truly transforms development workflows!

Subscribe to Javier in the Cloud

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe