Simplifying EKS Add-on Management Across Regions with a Custom Python Script

Managing EKS clusters across regions is challenging, especially ensuring all clusters run the latest compatible add-ons. This post explores a Python script to simplify verifying add-on versions, identifying upgrade options, and keeping your EKS clusters up-to-date efficiently.

Managing Amazon Elastic Kubernetes Service (EKS) clusters across multiple regions can be a complex task, especially when it comes to ensuring that all clusters are running the latest compatible versions of essential add-ons. In this blog post, we'll explore a custom Python script that streamlines the process of verifying installed add-on versions across all EKS clusters in your AWS account, identifies potential upgrade options, and provides actionable insights to keep your clusters up-to-date.


Why the Script is Useful

When deploying new EKS clusters using tools like eksctl or the AWS Management Console cluster creation wizard, the default add-ons installed may not be the latest available versions. For instance, using eksctl version 0.190.0 with no explicit version of Kubernetes to deploy, a new cluster with Kubernetes version 1.30 will be created with versions v1.11.1-eksbuild.8 of the coredns add-on (the latest version is v1.11.3-eksbuild.1), v1.30.0-eksbuild.3 of the kube–proxy add-on (the latest version is v1.30.3-eksbuild.9) and v1.18.1-eksbuild.3 of the vpc-cni add-on (the latest version is v1.18.3-eksbuild.3). Over time, as new add-on versions are released, your clusters can drift behind the latest versions, potentially missing out on critical updates, performance improvements, and security patches.

While the eksctl get addon command can retrieve compatible add-on versions, you still have to run it once per cluster, and it doesn't automatically single out the latest compatible version when multiple versions are available. Manually checking each cluster and add-on across different regions is time-consuming and error-prone.

This script addresses these challenges by:

  • Automating the Verification Process: Scans all EKS clusters in all regions (by default) or a list of regions defined as the value of the COMMA_SEPARATED_LIST_OF_REGIONS environment variable and lists the installed add-on versions.
  • Identifying Upgrade Options: Compares installed versions with the latest compatible versions and suggests upgrade commands.
  • Streamlining Cluster Management: Provides a consolidated view of all clusters, their statuses, and potential add-on upgrades, making it easier to maintain consistency and security across your Kubernetes environments.

How the Script Works

The script is designed to be efficient and user-friendly, providing clear outputs and actionable steps. Here's a high-level overview of its functionality:

1. Initialization and Environment Setup

The script begins by importing necessary libraries, including boto3 for AWS access and colorama for colored terminal output. It initializes color codes for better readability.

2. Parsing EKS Version Strings

A helper function parse_eks_addon_version is defined to parse EKS add-on version strings (e.g., v1.29.7-eksbuild.9) to help compare versions.

3. Retrieving and Validating AWS Regions

The script retrieves a list of available AWS regions and validates the regions provided by the user through the optional (but recommended) COMMA_SEPARATED_LIST_OF_REGIONS environment variable.

4. Scanning Clusters in Each Region

For each valid region, the script scans all EKS clusters:

  • Lists all clusters: By default (unless the COMMA_SEPARATED_LIST_OF_REGIONS environment variable is used) all clusters in all regions are retrieved.
  • Describes each cluster: Checks the cluster's status and Kubernetes version.
  • Lists installed add-ons: If the cluster's status is Active, its installed add-on versions are retrieved.
  • Compares add-on versions: Fetches and compares the latest compatible add-on versions with the installed Kuberenetes versions.
  • Suggests Upgrade Commands: If a newer add-on version is available, it provides the exact eksctl command to upgrade the add-on.

IMPORTANT NOTE: Upgrading add-ons is a disruptive operation. You should always check the documentation of each add-on: https://eksctl.io/usage/addons/#updating-addons

5. Sample Output

When the script is run, it provides detailed output for each cluster and add-on, along with upgrade commands if newer versions are available.

The script also provides a summary at the end:

For clarity, the script's Phyton code is shown below. You will need to install the boto3 and colorama libraries using the pip tool. I recommend setting up a Python virtual environment to run/test the script.


Conclusion

Managing EKS clusters across multiple regions doesn't have to be either daunting or neglected. This script simplifies cluster maintenance, enhances security, and ensures consistency across your Kubernetes environments by automating the verification of add-on versions and identifying upgrade paths.

By incorporating this script into your regular maintenance routines, you can:

  • Reduce Manual Effort: Automate the tedious process of checking each cluster individually.
  • Enhance Security: Stay ahead with the latest security patches and updates.
  • Maintain Consistency: Ensure all clusters are running compatible and up-to-date add-on versions.

Next Steps:

  • Customize the Script: You can modify the script to suit your specific needs, such as filtering certain clusters or add-ons or sending the results to the desired destination.
  • Integrate with CI/CD Pipelines: Incorporate the script into your deployment pipelines for continuous monitoring.
  • Stay Informed: Regularly check AWS announcements for new add-on releases and Kubernetes version support.

Full Script

Below is the complete Python script for your reference:

#!/usr/bin/env python3

import boto3
import sys
import re
import os
from botocore.exceptions import ClientError, NoCredentialsError
from colorama import init, Fore, Style

# Initialize colorama
init(autoreset=True)

# Define color codes
RED = Fore.RED
GREEN = Fore.GREEN
YELLOW = Fore.YELLOW
BLUE = Fore.BLUE
MAGENTA = Fore.MAGENTA
LIGHTYELLOW_EX = Fore.LIGHTYELLOW_EX
NC = Style.RESET_ALL


def parse_eks_addon_version(s):
    """
    Parses EKS version strings like 'v1.29.7-eksbuild.9' and returns a tuple:
    (major, minor, patch, eksbuild)
    """
    s = s.lstrip('v')
    match = re.match(r'^(\d+)\.(\d+)\.(\d+)(?:-eksbuild\.(\d+))?$', s)
    if match:
        major, minor, patch, eksbuild = match.groups()
        major = int(major)
        minor = int(minor)
        patch = int(patch)
        eksbuild = int(eksbuild) if eksbuild else 0
        return (major, minor, patch, eksbuild)
    else:
        return (0, 0, 0, 0)  # Return a default tuple if parsing fails


def get_available_regions(session):
    """
    Retrieves a list of available AWS regions.
    """
    try:
        ec2_client = session.client('ec2')
        response = ec2_client.describe_regions()
        return [r['RegionName'] for r in response['Regions']]
    except (ClientError, NoCredentialsError) as e:
        print(f"{RED}Error: Unable to retrieve AWS regions.")
        print(f"{YELLOW}Details: {str(e)}")
        sys.exit(1)


def validate_regions(regions_list, available_regions):
    """
    Validates the list of regions provided by the user.
    """
    valid_regions = []
    invalid_regions = []
    region_pattern = r'^[a-z]{2}-[a-z]+-\d+(-[a-z]+)?$'
    for region in regions_list:
        if not re.match(region_pattern, region):
            invalid_regions.append((region, "Invalid format"))
            continue
        if region not in available_regions:
            invalid_regions.append((region, "Region does not exist"))
            continue
        valid_regions.append(region)
    return valid_regions, invalid_regions


def scan_clusters_in_region(session, region, summary):
    """
    Scans EKS clusters in the specified region for upgradable add-ons.
    """
    print("")
    print(f"{GREEN}Scanning EKS clusters in region '{region}'...")
    eks_client = session.client('eks', region_name=region)

    # List clusters with pagination handling
    clusters = []
    next_token = ''
    try:
        while True:
            if next_token:
                clusters_response = eks_client.list_clusters(nextToken=next_token)
            else:
                clusters_response = eks_client.list_clusters()
            clusters.extend(clusters_response.get('clusters', []))
            next_token = clusters_response.get('nextToken')
            if not next_token:
                break
    except (ClientError, NoCredentialsError) as e:
        print(f"{RED}Error: Unable to list EKS clusters in region '{region}'.")
        print(f"{YELLOW}Details: {str(e)}")
        return

    if not clusters:
        print(f"{YELLOW}No EKS clusters found in region '{region}'.")
        return

    # Initialize counters for the region
    total_clusters = len(clusters)
    active_clusters = 0
    non_active_clusters = 0
    clusters_with_upgradable_addons = 0

    for cluster_name in clusters:
        try:
            cluster_info = eks_client.describe_cluster(name=cluster_name)
            cluster_status = cluster_info['cluster']['status']
        except ClientError as e:
            print(f"{RED}Error: Unable to describe cluster '{cluster_name}' in region '{region}'.")
            print(f"{YELLOW}Details: {str(e)}")
            continue

        print(f"{GREEN}{'-'*150}")
        print(f"{GREEN}Cluster Name: {BLUE}{cluster_name}")
        print(f"{GREEN}Cluster Status: {YELLOW}{cluster_status}")

        if cluster_status != 'ACTIVE':
            print(f"{YELLOW}Cluster '{cluster_name}' is not in ACTIVE status. Skipping add-on checks.")
            non_active_clusters += 1
            continue
        else:
            active_clusters += 1

        k8s_version = cluster_info['cluster']['version']
        print(f"{GREEN}Cluster Kubernetes Version: {RED}{k8s_version}")
        print(f"{GREEN}Checking for NEWER versions of installed Add-ons on Cluster {BLUE}{cluster_name}{GREEN} in the {YELLOW}{region}{GREEN} region running Kubernetes version {RED}{k8s_version}")
        print(f"{GREEN}{'-'*150}")

        # List add-ons with pagination handling
        addons = []
        next_token = ''
        try:
            while True:
                if next_token:
                    addons_response = eks_client.list_addons(clusterName=cluster_name, nextToken=next_token)
                else:
                    addons_response = eks_client.list_addons(clusterName=cluster_name)
                addons.extend(addons_response.get('addons', []))
                next_token = addons_response.get('nextToken')
                if not next_token:
                    break
        except ClientError as e:
            print(f"{RED}Error: Unable to list add-ons for cluster '{cluster_name}' in region '{region}'.")
            print(f"{YELLOW}Details: {str(e)}")
            continue

        if not addons:
            print(f"{YELLOW}No add-ons installed on cluster '{cluster_name}'.")
            continue

        # Flag to check if any add-on is upgradable in this cluster
        cluster_has_upgradable_addons = False

        for addon in addons:
            print(f"{YELLOW}Add-on: {addon}")

            # Fetch available versions with pagination
            available_versions = []
            next_token_versions = ''
            try:
                while True:
                    if next_token_versions:
                        addon_versions_response = eks_client.describe_addon_versions(
                            addonName=addon,
                            kubernetesVersion=k8s_version,
                            nextToken=next_token_versions
                        )
                    else:
                        addon_versions_response = eks_client.describe_addon_versions(
                            addonName=addon,
                            kubernetesVersion=k8s_version
                        )
                    addons_info = addon_versions_response.get('addons', [])
                    if not addons_info:
                        print(f"{RED}Error: No available versions found for add-on '{addon}'.")
                        break
                    addon_versions = addons_info[0].get('addonVersions', [])
                    available_versions.extend([v['addonVersion'] for v in addon_versions])
                    next_token_versions = addon_versions_response.get('nextToken')
                    if not next_token_versions:
                        break
                if not available_versions:
                    latest_version = "No compatible versions found"
                else:
                    # Sort versions using the parsed tuples
                    available_versions.sort(key=lambda s: parse_eks_addon_version(s), reverse=True)
                    latest_version = available_versions[0]
            except ClientError as e:
                print(f"{RED}Error: Unable to describe available versions for add-on '{addon}'.")
                print(f"{YELLOW}Details: {str(e)}")
                continue

            # Get the installed version of the addon
            try:
                installed_addon_info = eks_client.describe_addon(
                    clusterName=cluster_name,
                    addonName=addon
                )
                installed_version = installed_addon_info['addon']['addonVersion']
            except ClientError as e:
                print(f"{RED}Error: Unable to describe installed version for add-on '{addon}'.")
                print(f"{YELLOW}Details: {str(e)}")
                continue

            print(f"{NC}Latest Compatible Version: {GREEN}{latest_version}")
            print(f"{NC}Installed Version: {LIGHTYELLOW_EX}{installed_version}")

            # Parse versions before comparison
            installed_version_tuple = parse_eks_addon_version(installed_version)
            latest_version_tuple = parse_eks_addon_version(latest_version)

            # Check if the latest version is newer than the installed version
            if latest_version_tuple > installed_version_tuple:
                print(f"{BLUE}USE THIS COMMAND TO UPGRADE TO THE LATEST VERSION OF THE ADDON:")
                print(f"{MAGENTA}ALWAYS CHECK THE ADD-ON DOCS BEFORE: https://eksctl.io/usage/addons/#updating-addons")
                print(f"eksctl update addon --name {addon} --cluster {cluster_name} --version {latest_version} --region {region}")
                cluster_has_upgradable_addons = True
            else:
                print(f"{GREEN}ALREADY RUNNING THE LATEST VERSION")

            print(f"{GREEN}{'-'*50}")

        if cluster_has_upgradable_addons:
            clusters_with_upgradable_addons += 1

    # Add to summary if there are clusters in the region
    if total_clusters > 0:
        summary[region] = {
            'total_clusters': total_clusters,
            'active_clusters': active_clusters,
            'non_active_clusters': non_active_clusters,
            'clusters_with_upgradable_addons': clusters_with_upgradable_addons
        }


def main():
    # Read the COMMA_SEPARATED_LIST_OF_REGIONS environment variable
    regions_env = os.environ.get('COMMA_SEPARATED_LIST_OF_REGIONS', '')

    # Create a single session
    session = boto3.Session()

    # Get the list of available regions from AWS
    available_regions = get_available_regions(session)

    # If regions_env is empty, proceed to use all available regions
    if not regions_env.strip():
        print("")
        print(f"{RED}NOTE: COMMA_SEPARATED_LIST_OF_REGIONS environment variable is not set or is empty.")
        print(f"{MAGENTA}It is RECOMMENDED to set {YELLOW}COMMA_SEPARATED_LIST_OF_REGIONS{MAGENTA} to limit the regions being scanned.")
        print(f"{MAGENTA}ALL REGIONS WILL BE ANALYZED, which might take longer to complete.")
        print(f"{YELLOW}Example of COMMA_SEPARATED_LIST_OF_REGIONS: {NC}'us-east-1,eu-west-1,ap-southeast-2'")
        regions_list = available_regions
    else:
        # Split the regions and strip any whitespace
        regions_list = [region.strip() for region in regions_env.split(',') if region.strip()]

    if not regions_list:
        print(f"{RED}Error: No valid regions found in COMMA_SEPARATED_LIST_OF_REGIONS environment variable.")
        print(f"{YELLOW}Please ensure the variable contains at least one valid AWS region ID.")
        print(f"{YELLOW}Example: 'us-east-1,eu-west-1,ap-southeast-2'")
        sys.exit(1)

    # Validate regions
    valid_regions, invalid_regions = validate_regions(regions_list, available_regions)

    if invalid_regions:
        for region, reason in invalid_regions:
            print(f"{RED}Error: Provided region '{region}' is invalid. Reason: {reason}.")
        print(f"{YELLOW}Please provide valid AWS region IDs in COMMA_SEPARATED_LIST_OF_REGIONS environment variable.")
        print(f"{YELLOW}Example: 'us-east-1,eu-west-1,ap-southeast-2'")
        if not valid_regions:
            sys.exit(1)

    # Dictionaries to hold summary data
    summary = {}

    # Proceed with scanning clusters in valid regions
    for region in valid_regions:
        scan_clusters_in_region(session, region, summary)

    # Final summary
    if summary:
        print(f"{GREEN}{'='*80}")
        print(f"{YELLOW}REGIONS AND CLUSTERS SUMMARY:")
        for region, data in summary.items():
            total = data['total_clusters']
            active = data['active_clusters']
            non_active = data['non_active_clusters']
            upgradable = data['clusters_with_upgradable_addons']
            if total > 0:
                print(f"{GREEN}Region: {BLUE}{region}")
                print(f"{GREEN}Total number of EKS Clusters in the region: {YELLOW}{total}")
                print(f"{GREEN}Clusters in ACTIVE state in the region: {YELLOW}{active}")
                if non_active > 0:
                    print(f"{GREEN}Clusters NOT in ACTIVE state in the region: {YELLOW}{non_active}")
                    print(f"{YELLOW}{non_active} clusters are in non-ACTIVE state and might have upgradable add-ons.")
                print(f"{GREEN}Clusters with Upgradable Add-ons: {YELLOW}{upgradable}")
                print(f"{GREEN}{'-'*40}")
    else:
        print(f"{YELLOW}No EKS clusters found in the scanned regions.")


if __name__ == "__main__":
    main()

By leveraging this script, you can maintain optimal performance and security for your EKS clusters across all regions. Automating such routine checks ensures you can focus on building and scaling your applications rather than worrying about cluster maintenance.

Subscribe to Javier in the Cloud

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe