Managing Kubernetes Storage: Automating PV and PVC Cleanup

Kubernetes, the de facto orchestration system for containerized applications, offers robust solutions for managing storage through Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). These resources ensure that storage persists beyond the lifecycle of individual pods, but managing them, especially cleaning up "Released" PVs and PVCs, can become a daunting task as your cluster grows. Today, I'll walk you through an automated solution for managing these resources, specifically focusing on cleaning up "Released" PVs and PVCs in your Kubernetes cluster.

Understanding PVs and PVCs

Before diving into the solution, let's briefly recap what PVs and PVCs are:

  • Persistent Volumes (PVs): Cluster-wide storage units provisioned by an administrator. They are a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
  • Persistent Volume Claims (PVCs): Requests for storage by a user. They allow a user to request storage resources without specific knowledge of the underlying storage infrastructure.

When a PVC is bound to a PV and then released, the PV can remain in a "Released" state, still holding onto the data but no longer bound to a PVC. Over time, these unclaimed resources can accumulate, necessitating a cleanup to free up storage space or to ensure compliance with data retention policies.

The Challenge of Cleanup

Manually tracking and cleaning up these "Released" PVs and their associated PVCs is time-consuming and error-prone. Automating this process saves time, reduces the likelihood of human error, and can be integrated into your regular maintenance routines or CI/CD pipelines.

Automated Cleanup Solution

To address this, I've developed a Python script that automates the listing and deletion of "Released" PVs and their associated PVCs within a specified namespace. This script utilizes the Kubernetes Python client to interact with your cluster's API, providing a straightforward command-line interface for managing these resources.

Key Features

  • List "Released" PVs: Quickly view all "Released" PVs and their associated PVCs within a specified namespace.
  • Safe Deletion: Option to delete "Released" PVs and their associated PVCs, with added error handling to manage resources already deleted or not found, preventing script failure and providing clear feedback.
  • Namespace Specific: Operates within a specified namespace, allowing for targeted cleanup without affecting other areas of your cluster.


  • Python 3.6 or newer.
  • Kubernetes Python client installed (pip install kubernetes).
  • kubectl configured for your cluster.


  1. Listing "Released" PVs by Namespace:
   python3 <namespace>

This command lists all "Released" PVs and their associated PVCs within the specified namespace.

  1. Deleting "Released" PVs and PVCs by Namespace:
   python3 <namespace> --delete

This command deletes all "Released" PVs and their associated PVCs within the specified namespace. Use with caution, as this action is irreversible.

Safety and Considerations

  • Data Backup: Always ensure data is backed up before deletion, especially if any PVs might contain critical information.
  • Testing: Test the script in a non-production environment to ensure it meets your needs and understand its effects fully.


Automating the cleanup of "Released" PVs and PVCs in Kubernetes not only simplifies cluster management but also helps maintain a clean, efficient environment. By integrating such scripts into your operational routines, you can ensure that your Kubernetes storage remains well-organized and that resources are utilized effectively.

Remember, with great power comes great responsibility. Automate wisely, and happy Kubernetes'ing!

Here's the script:

import argparse
from kubernetes import client, config
from import ApiException

class KubernetesPVCPVManager:
    def __init__(self):
        # Load the kubeconfig file from the default location.
        self.v1 = client.CoreV1Api()

    def list_released_pvs_by_namespace(self, namespace_filter):
            all_pvs = self.v1.list_persistent_volume()
            filtered_pvs = [
                pv for pv in all_pvs.items
                if pv.status.phase == "Released" and pv.spec.claim_ref and pv.spec.claim_ref.namespace == namespace_filter

            if filtered_pvs:
                print(f"Released PVs in {namespace_filter} namespace:")
                for pv in filtered_pvs:
                    claim_ref = pv.spec.claim_ref
                    print(f"- PV Name: {}, Capacity: {pv.spec.capacity['storage']}, StorageClass: {pv.spec.storage_class_name}, Claim: {claim_ref.namespace}/{}")
                print(f"No released PVs found in {namespace_filter} namespace.")
        except ApiException as e:
            print(f"Exception when calling CoreV1Api->list_persistent_volume: {e}")

    def delete_released_pv_and_pvc(self, namespace_filter):
            all_pvs = self.v1.list_persistent_volume()
            for pv in all_pvs.items:
                if pv.status.phase == "Released" and pv.spec.claim_ref and pv.spec.claim_ref.namespace == namespace_filter:
                    pvc_name =
                        print(f"Attempting to delete PVC {pvc_name} in namespace {namespace_filter}")
                        self.v1.delete_namespaced_persistent_volume_claim(name=pvc_name, namespace=namespace_filter)
                        print(f"Successfully deleted PVC {pvc_name} in namespace {namespace_filter}")
                    except ApiException as e:
                        if e.status == 404:
                            print(f"PVC {pvc_name} in namespace {namespace_filter} not found. It might have already been deleted.")
                        print(f"Attempting to delete PV {}")
                        print(f"Successfully deleted PV {}")
                    except ApiException as e:
                        if e.status == 404:
                            print(f"PV {} not found. It might have already been deleted.")
        except ApiException as e:
            print(f"Exception when deleting PV or PVC: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Manage released PVs and PVCs by namespace.')
    parser.add_argument('namespace', type=str, help='The namespace of the PVCs to manage.')
    parser.add_argument('--delete', action='store_true', help='Delete the released PVs and their associated PVCs in the specified namespace.')

    args = parser.parse_args()
    namespace_to_filter = args.namespace
    delete_option = args.delete

    manager = KubernetesPVCPVManager()
    if delete_option:

Streamlining Kubernetes Log Retrieval with Python

Working with Kubernetes can often involve sifting through logs to troubleshoot or monitor applications. This can be cumbersome, especially when dealing with multiple pods across various namespaces. To simplify this process, I've developed a Python script that enhances the experience of fetching and managing Kubernetes pod logs.

The Challenge

Retrieving logs in Kubernetes typically requires using kubectl commands with specific flags for each pod and namespace. This manual process becomes inefficient and time-consuming, particularly when you need to access logs from multiple pods or wish to monitor logs in real time.

The Solution

To address these challenges, I've created a Python script that automates log retrieval, allowing for interactive selection of namespaces and pods, real-time log tailing, and the option to export logs with automatic naming and organization. Additionally, for users operating within Windows Subsystem for Linux (WSL2), the script includes functionality to open the file explorer directly to the log file's location, bridging the gap between the Linux command line and the Windows graphical interface.

Key Features

  • Interactive Namespace and Pod Selection: Choose which pod's logs to fetch without manually typing namespace and pod names.
  • Real-time Log Tailing: Option to tail logs for ongoing monitoring.
  • Automatic Log Export: Export logs to a timestamped file for easy organization and later review.
  • File Explorer Integration: For WSL2 users, the script can open Windows Explorer to the directory containing the exported log files, streamlining access to logs.


  • Python 3
  • Kubernetes Python client (kubernetes)
  • questionary for interactive prompts

Before running the script, ensure you have the necessary packages installed:

pip install kubernetes questionary

The Script

# Import necessary libraries
import os
import sys
import subprocess
from datetime import datetime
from kubernetes import client, config
import questionary

# Function definitions for loading kube config, listing namespaces and pods,
# and the enhanced get_pod_logs function

def load_kube_config(custom_kubeconfig=None):
    # Load Kubernetes configuration
    if custom_kubeconfig:

def list_namespaces(api_instance):
    # Return a list of namespaces in the cluster
        namespaces = api_instance.list_namespace()
        return [ for namespace in namespaces.items]
    except as e:
        print(f"Failed to list namespaces: {e}")

def list_pods(api_instance, namespace):
    # Return a list of pods in a specified namespace
        pods = api_instance.list_namespaced_pod(namespace)
        return [ for pod in pods.items]
    except as e:
        print(f"Failed to list pods in namespace '{namespace}': {e}")

def get_pod_logs(api_instance, namespace, pod_name, tail=False, export=False):
    # Fetch and print or export logs from a specific pod
        logs = api_instance.read_namespaced_pod_log(name=pod_name, namespace=namespace, follow=tail, _preload_content=False)
        if export:
            # Define the directory to save logs
            log_directory = os.path.expanduser('~/K8s_Logs')
            os.makedirs(log_directory, exist_ok=True)
            # Generate a file name
            timestamp ='%Y-%m-%d_%H-%M-%S')
            file_name = f"{namespace}_{pod_name}_{timestamp}.log"
            export_path = os.path.join(log_directory, file_name)
            with open(export_path, 'w') as file:
                for line in
            print(f"Logs have been exported to {export_path}")
            # Open file explorer to the log directory (Adjustment for WSL2 users included)
            if == 'nt':  # Windows
      ['explorer', os.path.normpath(log_directory)])
            elif 'microsoft' in os.uname().release:  # WSL2
                linux_path = log_directory.replace('/home', 'home')
                wsl_path = '\\\\wsl$\\Ubuntu\\' + linux_path.replace('/', '\\')
      ['explorer.exe', wsl_path])
          ['xdg-open', log_directory])
                except Exception:
                    print("Could not automatically open the file explorer.")
        elif tail:
            print(f"Tailing logs for pod {pod_name} in namespace {namespace}:")
            for line in
                print(line.decode('utf-8'), end='')
            print(f"Logs for pod {pod_name} in namespace {namespace}:\n{'utf-

    except as e:
        print(f"Failed to retrieve logs: {e}")

# Main script execution for interactive selections and log management

if __name__ == "__main__":
    custom_kubeconfig = questionary.text("Enter custom kubeconfig path (leave blank for default):").ask()
    v1 = client.CoreV1Api()
    namespaces = list_namespaces(v1)
    namespace ="Select a namespace:", choices=namespaces).ask()
    pods = list_pods(v1, namespace)
    pod_name ="Select a pod:", choices=pods).ask()
    tail_logs = questionary.confirm("Tail logs?").ask()
    export_logs = not tail_logs and questionary.confirm("Export logs to a file?").ask()
    get_pod_logs(v1, namespace, pod_name, tail=tail_logs, export=export_logs)


This Python script represents a significant step forward in simplifying Kubernetes log management. By automating and enhancing the process of fetching logs, it saves time and increases efficiency, allowing developers and system administrators to focus on more critical tasks. Whether you're troubleshooting an application or simply need to keep an eye on your services, this tool provides a streamlined and user-friendly approach to handling Kubernetes logs.

Automating Helm Chart Updates and Downloads with Python

Managing Helm charts efficiently is crucial for Kubernetes administrators and DevOps engineers. Today, I'll guide you through automating the process of checking for the latest version of a Helm chart and downloading it using Python. Specifically, we'll focus on the aws-ebs-csi-driver Helm chart as an example, but the principles can be applied to any Helm chart.

Why Automate Helm Chart Management?

Automating Helm chart management, including checking for updates and downloading new versions, streamlines deployment processes, ensures consistency, and reduces manual errors. It's especially beneficial in environments where maintaining the latest versions of software is critical for security, performance, or feature availability.

Setting Up Your Environment

Before we begin, ensure you have Python and pip installed on your system. You'll also need the requests library, which can be installed via pip if you haven't already:

pip install requests

Step 1: Fetching the Latest Chart Version

First, let's write a function to check the latest version of the aws-ebs-csi-driver Helm chart from Artifact Hub. We'll use the Artifact Hub API to fetch the version information.

import requests

def get_latest_chart_version_from_artifact_hub():
    api_url = ""

        response = requests.get(api_url)
        data = response.json()
        return data['version']
    except requests.RequestException as e:
        print(f"Failed to fetch the latest version: {e}")
        return None

This function makes a GET request to the Artifact Hub API and parses the version from the JSON response.

Step 2: Downloading the Chart

Next, we'll create a function to download the Helm chart using the version number obtained from step 1. The download URL pattern will depend on where the chart is hosted. For our example, we're assuming the chart is hosted on GitHub.

def download_helm_chart(version):
    download_url = f"{version}/aws-ebs-csi-driver-{version}.tgz"

        with requests.get(download_url, stream=True) as r:
            with open(f"aws-ebs-csi-driver-{version}.tgz", 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192):
        print(f"Successfully downloaded aws-ebs-csi-driver-{version}.tgz")
    except requests.RequestException as e:
        print(f"Error downloading the chart: {e}")

Step 3: Integrating the Functions

Finally, let's integrate these functions into a main workflow that checks for the latest version and prompts the user to confirm the download.

def main():
    latest_version = get_latest_chart_version_from_artifact_hub()
    if latest_version:
        print(f"Latest version of the aws-ebs-csi-driver Helm chart: {latest_version}")
        confirm = input("Do you want to download this version? (yes/no) ").strip().lower()
        if confirm == 'yes':
            print("Download canceled.")
        print("Could not fetch the latest version.")

if __name__ == "__main__":


Automating Helm chart management not only saves time but also ensures you're always up-to-date with the latest versions of your Kubernetes deployments. With Python and a few lines of code, we've demonstrated how to streamline checking for updates and downloading Helm charts. This approach can be adapted to manage other charts and integrated into larger automation workflows to enhance deployment efficiency and reliability.

Remember, automation is key to modern DevOps practices, and managing Kubernetes resources efficiently plays a critical role in the successful operation of your infrastructure.

Streamlining Troubleshooting: How to Re-Run EC2 UserData for Effective Problem Solving

In the intricate web of cloud computing, efficiently managing and troubleshooting EC2 instances is a cornerstone of maintaining a robust AWS infrastructure. One of the lesser-known, yet powerful, capabilities is the ability to re-run UserData scripts on these instances. Originally intended to execute only during the initial launch, there are practical scenarios where re-executing UserData becomes not just beneficial but necessary. Inspired by Elliot DeNolf's insightful post on this very subject, let's delve deeper into the process and understand how to leverage this capability for troubleshooting and system updates.

The Why and How of Re-Running EC2 UserData

UserData scripts are the silent workers of the EC2 world, setting up your instance with the necessary configurations, software, and environments right from the get-go. But what happens when you need to adjust these configurations or troubleshoot issues that arise from the initial setup? Elliot DeNolf, in his guide on re-running EC2 UserData, provides a straightforward approach to this task, which we'll explore and expand upon.

Step 1: Secure Your Connection

First things first, establish a secure connection to your EC2 instance. This is done via SSH, a protocol that provides a secure channel over an unsecured network. Use the following command, substituting your certificate and instance details:

ssh -i "my-cert.pem" ec2-user@my.machine.ip

Understanding the ins and outs of SSH is fundamental, and AWS Documentation is a treasure trove of information for those looking to deepen their knowledge.

Step 2: Elevate Your Privileges

Once connected, it's time to switch to the root user. This step is crucial as it provides the permissions necessary to access and modify system files and settings:

sudo -i

Step 3: Fetch Your UserData

UserData scripts are accessible via a special URL and can be fetched directly within your EC2 instance. Use curl to redirect this data to a file, allowing for inspection and modification:

curl http://instance-data/latest/user-data >

Step 4: Review and Modify

Before re-execution, take a moment to review the UserData script. This can be done using simple text viewing commands like cat or vim. It's an opportunity to ensure that the script performs as expected or to make any necessary adjustments:

cat ./

Step 5: Execute

With the script ready and reviewed, modify its permissions to make it executable, then run it to apply your changes or updates:

chmod +x

Alternative Execution Methods

Elliot DeNolf highlights two alternative approaches for those seeking different levels of engagement with the script:

  • Direct Execution: Bypass script inspection and run it directly using:
  curl http://instance-data/latest/user-data | sh
  • Verbose Execution: Enhance transparency by modifying the script to output each command to STDOUT as it runs. Simply add set -ex to the top of your script.

Leveraging UserData for Troubleshooting

The process outlined by DeNolf and explored further here underlines the versatility and power of EC2 UserData. By re-running UserData, administrators and DevOps engineers can swiftly address and rectify issues, update configurations, or simply ensure that their instances are in the desired state without the need for instance termination and recreation.

It's a testament to the cloud's flexibility and the importance of mastering such techniques for anyone tasked with managing cloud infrastructure. Elliot DeNolf's original post serves as a valuable resource for those looking to harness the full potential of EC2 UserData in their troubleshooting and configuration management toolkit.

This exploration into re-running EC2 UserData reaffirms the notion that with the right knowledge and tools, the cloud's complexity becomes its strength, offering unparalleled control and flexibility to its stewards.

Automating AWS Backup Checks with Python: A Simple Guide

In the vast expanse of AWS infrastructure, keeping track of which instances are marked for backup can be a daunting task. Whether you're managing a handful of instances or overseeing a sprawling cloud environment, the importance of a robust backup strategy cannot be overstated. Today, we're diving into how a simple Python script can streamline this process, ensuring your critical AWS EC2 instances are always backed up.

The Need for Backup Automation

Backup strategies are the safety nets of the digital world, protecting against data loss due to system failures, human errors, or malicious attacks. In AWS, tagging resources allows for organized management, including which instances require backups. However, manually checking these tags to confirm backup compliance is inefficient and prone to error, especially as your environment grows. This is where automation comes in.

Python and Boto3: A Powerful Duo

Python, with its simplicity and the powerful AWS SDK package Boto3, makes automating AWS tasks accessible and efficient. Boto3 allows Python scripts to interact with AWS services directly, leveraging the full range of AWS APIs. For our task, we'll use Boto3 to query EC2 instances, focusing on those tagged with backup=true.

Step-by-Step Guide to Listing Backup-Tagged Instances

Here's a concise guide to creating a Python script that lists all your EC2 instances marked for backup.

Step 1: Setting Up

Ensure you have Python and Boto3 installed. If Boto3 isn't installed yet, you can add it to your environment using pip:

pip install boto3

Also, make sure your AWS CLI is configured for access, especially if you're using AWS Single Sign-On (SSO).

Step 2: The Script

Our script initializes an EC2 client, filters instances by the backup=true tag, and prints out details for each matching instance.

import boto3

def list_backup_enabled_instances():
    ec2 = boto3.client('ec2')
    filters = [{'Name': 'tag:backup', 'Values': ['true']}]

    instances = ec2.describe_instances(Filters=filters)['Reservations']
    for reservation in instances:
        for instance in reservation['Instances']:

def print_details(instance):
    instance_id = instance['InstanceId']
    name_tag = next((tag['Value'] for tag in instance.get('Tags', []) if tag['Key'] == 'Name'), None)
    state = instance['State']['Name']
    instance_type = instance['InstanceType']
    launch_time = instance['LaunchTime'].strftime('%Y-%m-%d %H:%M:%S')

    print(f"Instance ID: {instance_id}")
    print(f"Name: {name_tag}")
    print(f"State: {state}")
    print(f"Instance Type: {instance_type}")
    print(f"Launch Time: {launch_time}")
    print("-" * 60)

if __name__ == "__main__":

Step 3: Running the Script

Save the script as and run it using:



This simple yet effective script exemplifies how automating routine checks can save time, reduce errors, and ensure compliance with your backup strategy. As your AWS infrastructure grows, incorporating such scripts into your operational toolkit becomes increasingly beneficial, allowing you to focus on innovation rather than manual oversight.

Remember, while the script provides a snapshot of your backup compliance, it's part of a broader disaster recovery strategy. Always ensure your backups are tested regularly for integrity and restorability. Happy coding, and here's to robust, automated backups!

Applying Kant’s Ethical Principles in the DevOps World

In the rapidly evolving landscape of technology and DevOps, we often focus on the technical skills and tools that drive our industry forward. Yet, the ethical framework guiding our decisions can significantly impact our teams, products, and the broader community. Immanuel Kant, a luminary in the realm of philosophy, provides timeless principles that, though centuries old, offer valuable insights for today's DevOps professionals.

1. Universal Actions: The Golden Rule of Development and Operations

Imagine a world where every line of code, every deployment, and every interaction with team members followed a principle you'd want everyone else to adopt. Kant encourages us to act in ways that, if universally applied, would lead to a sustainable and functional system. In DevOps, this means advocating for practices that enhance collaboration, reliability, and ethical use of technology.

2. Respecting Individuals: Beyond Tools and Processes

In the pursuit of efficiency and innovation, it's crucial to remember that our colleagues and end-users are not mere means to an end. Kant's vision of treating humanity with inherent dignity translates into fostering a culture of respect, empathy, and inclusiveness within our teams and in how we design our systems for users.

3. Setting Standards: Crafting the Blueprint for Ethical Tech

Our actions and decisions in the tech sphere often set precedents. Kant's idea of acting as if your choices were to become a universal law urges us to consider the long-term implications of our work. Are we contributing to a digital world we'd be proud to live in?

4. Building a Community of Equals: The Ultimate DevOps Goal

Kant's kingdom of ends mirrors the ideal DevOps culture where collaboration, shared responsibility, and mutual respect are paramount. Striving for this environment means creating workflows and policies that empower everyone to contribute their best, regardless of their role.

5. Personal and Professional Growth: The Never-Ending DevOps Journey

The principle of developing one's talents is especially relevant in our fast-paced field. Continuous learning, sharing knowledge, and applying our skills for the greater good enrich not only our careers but also the communities we're a part of.

6. Guided by Reason: Making Thoughtful Decisions

In an era dominated by rapid decision-making and constant innovation, Kant's call to let reason guide us is a sobering reminder. Reflecting on the why behind our actions, considering the broader impact, and making informed choices can lead to more ethical and effective outcomes.

In conclusion, while Kant might not have been a DevOps engineer, his principles provide a moral compass for navigating the complex ethical landscapes we encounter in technology and development. By integrating these timeless values into our daily practices, we can strive for a tech world that is not only advanced but also just, inclusive, and human-centered.

Navigating the DevOps Landscape: A Comparative Analysis of Mend CLI and JFrog

The evolution of DevOps practices has given rise to a plethora of tools designed to streamline and enhance the software development lifecycle (SDLC). Among these tools, security and artifact management solutions like Mend CLI (formerly known as WhiteSource) and JFrog Artifactory have become indispensable for organizations aiming to bolster their software supply chain security and efficiency. This article provides a comprehensive comparison of Mend CLI and JFrog, highlighting their functionalities, benefits, and how they cater to different aspects of DevOps workflows.

Mend CLI: Security-Centric Approach

Mend CLI, part of the Mend suite, emphasizes vulnerability detection and remediation within open-source components. Its primary goal is to secure applications by identifying known security vulnerabilities in dependencies used within your project. Here are some of its key features:

  • Vulnerability Scanning: Mend CLI scans project dependencies against a comprehensive database of known vulnerabilities, providing timely alerts.
  • Automated Remediation: It suggests and can automate the update or replacement of vulnerable components with secure versions.
  • Policy Enforcement: Allows the configuration of policies to automatically enforce security standards across all development stages.
  • Integration: Easily integrates with CI/CD pipelines, enhancing the DevOps workflow without compromising speed.

JFrog Artifactory: Mastering Artifact Management

JFrog Artifactory, on the other hand, serves as a universal artifact repository manager. It is designed to store and manage binaries, containers, and software libraries across the entire SDLC. JFrog's key offerings include:

  • Universal Support: Compatible with a wide array of package formats and CI/CD tools, facilitating seamless integration into any DevOps ecosystem.
  • High Availability: Offers robust features such as replication, clustering, and cloud storage to ensure high availability and scalability.
  • Security and Access Control: Features include encrypted password storage, secure access with fine-grained permissions, and vulnerability scanning through integration with JFrog Xray.
  • Build Integration: Tracks artifact usage across different builds and environments, enhancing traceability and auditing.

Comparing Mend CLI and JFrog Artifactory

While both Mend CLI and JFrog Artifactory are pivotal in modern DevOps environments, their primary focus and functionality differ significantly:

  • Focus Area: Mend CLI is primarily focused on enhancing security through the detection and remediation of vulnerabilities in open-source dependencies. JFrog Artifactory, conversely, is centered around artifact management, providing a robust solution for storing, managing, and distributing software packages.
  • Integration and Compatibility: Mend CLI integrates directly into the development process, offering tools specifically designed for vulnerability scanning within the coding phase. JFrog, with its universal package management capabilities, integrates across the SDLC, supporting a broader range of programming languages and package formats.
  • User Experience: Users of Mend CLI benefit from its focus on automating the secure use of open-source software, making it a critical tool for developers and security teams. JFrog Artifactory is geared towards DevOps engineers and architects, focusing on optimizing artifact storage and flow throughout the development, testing, and deployment phases.


In the quest for more secure and efficient DevOps practices, both Mend CLI and JFrog Artifactory play crucial roles. Mend CLI addresses the critical need for security in the use of open-source components, while JFrog Artifactory excels in artifact management, ensuring that binaries, libraries, and containers are efficiently managed and integrated into the software development process. The choice between Mend CLI and JFrog Artifactory should be guided by the specific needs of an organization’s DevOps workflow, security requirements, and the complexity of their software supply chain. By leveraging the strengths of each tool, teams can achieve a balanced approach to secure, efficient, and effective software development and delivery.

Automating AWS EC2 Instance Management with Python

Managing AWS EC2 instances can often require repetitive tasks such as listing available instances, starting stopped ones, and securely logging into them. Automating these tasks not only saves time but also reduces the possibility of human error. In this blog post, we'll explore how to use Python and Boto3, AWS's SDK for Python, to create a simple yet powerful script that lists your EC2 instances, powers on any that are off, and then securely logs into them.

Getting Started

Before diving into the code, ensure you have Python and Boto3 installed in your environment. Boto3 allows you to directly interact with AWS services in Python. If you haven't installed Boto3 yet, you can do so by running pip install boto3.

The Script Explained

Our script performs several key functions to manage EC2 instances efficiently. Let's break down each part of the script:


First, we initialize a Boto3 EC2 resource and a client. These objects allow us to interact with the EC2 instances and perform actions like starting or stopping them.

import boto3

ec2 = boto3.resource('ec2')
ec2_client = boto3.client('ec2')

Listing Instances

The list_all_instances function lists all EC2 instances based on specific filters. It can filter instances by name and whether they are running or stopped. This makes it easy to identify which instances are available for connection.

Selecting an Instance

The select_instance function prompts the user to select an instance from the list of available ones. This interaction ensures that users have control over which instance they're managing.

Handling SSH Keys

To securely log into an instance, the correct SSH key is required. The find_key_for_instance function locates the SSH key associated with the instance, ensuring a secure connection.

Starting Stopped Instances

One of the script's key features is its ability to start instances that are found to be stopped. This is handled in the ssh_into_instance function, which checks the instance's state before attempting to log in. If the instance is stopped, the script starts it and waits until it's in the 'running' state.

Securely Logging Into Instances

Finally, once the instance is ready, the script logs into it using SSH. It constructs the SSH command with the correct parameters, including the path to the SSH key, and executes it.

Running the Script

To run the script, simply execute it in your terminal. You'll be prompted to decide whether to include stopped instances in the list and to optionally filter instances by name. Then, select the instance you wish to log into from the provided list. If the selected instance is stopped, the script will start it and then automatically log you in.


This script is a starting point for automating EC2 instance management. By leveraging Python and Boto3, you can customize and extend this script to fit your specific needs, such as automating backups, deploying applications, or managing instance lifecycles. Automating these tasks not only saves time but also ensures that your operations are more secure and efficient.

Remember, automation is key to effective cloud management. By incorporating scripts like this into your workflow, you can focus on more important tasks while letting your code handle the routine work.

import boto3
import subprocess
import os
import time

# Initialize a boto3 EC2 resource
ec2 = boto3.resource('ec2')
ec2_client = boto3.client('ec2')  # Added for starting instances

def list_all_instances(include_stopped=False, search_term=None):
    """List all EC2 instances, optionally excluding stopped instances and filtering by search term."""
    filters = [{'Name': 'tag:Name', 'Values': ['*'+search_term+'*']} if search_term else {'Name': 'instance-state-name', 'Values': ['running', 'stopped']}]

    if not include_stopped:
        filters.append({'Name': 'instance-state-name', 'Values': ['running']})

    instances = ec2.instances.filter(Filters=filters)
    return instances

def get_instance_name(instance):
    """Extract the name of the instance from its tags."""
    for tag in instance.tags or []:
        if tag['Key'] == 'Name':
            return tag['Value']
    return "No Name"

def select_instance(instances):
    """Allow the user to select an instance to log into."""
    print("Available instances:")
    if not instances:
        print("No matching instances found.")
        return None

    for i, instance in enumerate(instances, start=1):
        name = get_instance_name(instance)
        print(f"{i}) Name: {name}, Instance ID: {}, State: {instance.state['Name']}")

    selection = input("Enter the number of the instance you want to log into (or 'exit' to quit): ")
    if selection.lower() == 'exit':
        return None
        selection = int(selection) - 1
        return list(instances)[selection]
    except (ValueError, IndexError):
        print("Invalid selection.")
        return None

def find_key_for_instance(instance):
    """Find the SSH key for the instance based on its KeyName."""
    key_name = instance.key_name
    keys_directory = os.path.expanduser("~/.ssh")
    for key_file in os.listdir(keys_directory):
        if key_file.startswith(key_name) and key_file.endswith(".pem"):
            return os.path.join(keys_directory, key_file)
    return None

def ssh_into_instance(instance, remote_user="ec2-user"):
    """SSH into the selected instance, if any."""
    if instance is None:

    # Check instance state and start if stopped
    if instance.state['Name'] == 'stopped':
        print(f"Instance {} is stopped. Starting instance...")
        print("Waiting for instance to be in 'running' state...")
        instance.reload()  # Refresh instance state and details

    ssh_key_path = find_key_for_instance(instance)
    if not ssh_key_path:
        print(f"No matching SSH key found for instance {} with KeyName {instance.key_name}")

    print(f"Logging into {get_instance_name(instance)} ({})...")
    private_ip = instance.private_ip_address
    ssh_cmd = f'ssh -o StrictHostKeyChecking=no -i {ssh_key_path} {remote_user}@{private_ip}', shell=True)

def main():
    """Main function to list instances and allow user selection for SSH login."""
    include_stopped = input("Include stopped instances? (yes/no): ").lower().startswith('y')
    search_term = input("Enter a search term to filter by instance name (leave empty for no filter): ").strip() or None
    instances = list(list_all_instances(include_stopped, search_term))

    selected_instance = select_instance(instances)

if __name__ == "__main__":

Navigating the Waters of API Rate Limiting with Jenkins: A DevOps Tale

In the dynamic world of DevOps, managing API rate limits is akin to steering a ship through treacherous waters. The GitHub API, with its stringent usage quotas, poses a significant challenge for continuous integration and delivery pipelines, particularly those orchestrated by Jenkins. This article unfolds a real-world scenario faced by a DevOps Engineer named Alan, highlighting the delicate balance between automation efficiency and API usage constraints.

The Challenge at Hand

On a seemingly regular day, Alan initiates a connection to GitHub via Jenkins, utilizing a service account to fetch necessary data for ongoing development work. The clock ticks, and Jenkins reports a concerning status: the current quota for GitHub API usage is alarmingly close to the limit. With the next quota refresh minutes away, Jenkins opts for a strategic pause.

This pause is part of Jenkins' rate-limiting strategy, designed to prevent hitting the GitHub API rate limit—a scenario that could halt development workflows, delay deployments, and disrupt service operations. The imposed limiter is a testament to Jenkins' attempt at evenly distributing API requests over time, ensuring that the pipeline remains operational without exceeding GitHub's stringent rate limits.

The Jenkins Strategy

Jenkins' approach to handling the GitHub API rate limit is multifaceted. Initially, it spreads out API requests to stay within the limit, but when projections indicate an overshoot, it enforces a sleep period. This strategy allows ongoing tasks to queue up without bombarding the GitHub API, a method that reflects foresight and adaptation in automation practices.

However, this strategy is not without its drawbacks. With the quota now over budget, Jenkins decides on a longer sleep. This decision, while necessary to avoid exceeding the API limit, introduces delays into the development process, highlighting a critical balancing act between adhering to external constraints and maintaining internal efficiency.

A Call for Adaptive Solutions

This scenario underscores the need for adaptive solutions in managing API rate limits within CI/CD pipelines. Alan contemplates several improvements:

  1. Dynamic Rate Limiting: Implementing a more dynamic approach to rate limiting within Jenkins that adjusts more closely in real-time to the actual usage and remaining quota.
  2. Caching Responses: Caching GitHub API responses where possible to reduce the number of required requests.
  3. Prioritization of Tasks: Introducing a prioritization system for API requests to ensure critical tasks receive precedence over less urgent ones.
  4. Alternative Strategies: Exploring alternative rate-limiting strategies that could offer more nuanced control.

Embracing the Future

The tale of Alan and the Jenkins-imposed API limiter is a microcosm of the broader challenges faced by DevOps engineers worldwide. As APIs become increasingly central to software development and operational processes, efficiently managing their usage limits will remain a paramount concern.

The experience shared by Alan not only highlights the complexities inherent in integrating external services into CI/CD pipelines but also serves as a catalyst for innovation. By continually refining strategies to navigate API rate limits, the DevOps community can ensure that automation tools like Jenkins remain powerful allies in the quest for efficient, uninterrupted software delivery.

In conclusion, as we venture further into the era of automation and continuous integration, stories like Alan's remind us of the importance of adaptability, strategic planning, and the ongoing pursuit of optimization in the face of external constraints. Through collaborative effort and innovative thinking, the challenges posed by API rate limits can be transformed into opportunities for improvement, driving the DevOps field toward ever-greater efficiencies and successes.

Navigating the Challenges of Jenkins: A DevOps Perspective

In the ever-evolving landscape of DevOps, Jenkins has long stood as a cornerstone tool for continuous integration and continuous delivery (CI/CD). Born in the early days of agile development, Jenkins provided an open-source platform that was revolutionary for its time, automating the build, test, and deployment phases of software development. However, as the demands of software delivery have grown in complexity and scale, many DevOps professionals, myself included, have encountered significant challenges with Jenkins. This article aims to explore some of these challenges, not to discredit Jenkins but to provide a balanced view for teams considering their CI/CD tooling options.

1. Configuration Complexity

One of the most cited frustrations with Jenkins is its configuration complexity. Jenkins' flexibility is a double-edged sword; while it allows for extensive customization, it also means that setting up and maintaining a Jenkins pipeline can be daunting. The reliance on Groovy scripts for pipeline definitions adds another layer of complexity, especially for teams without Groovy expertise. Furthermore, managing plugins and ensuring compatibility can be an arduous task, often requiring significant time and effort to troubleshoot.

2. UI and Usability Issues

Jenkins' user interface has been criticized for not keeping pace with modern design standards. The UI can be unintuitive and cumbersome, especially for new users. Navigating between projects, configuring jobs, and finding logs can be inefficient, impacting productivity. While there have been improvements in recent versions, the UI still lags behind more modern CI/CD platforms, which offer cleaner, more user-friendly interfaces.

3. Scalability Concerns

As projects grow in size and complexity, Jenkins can struggle to scale efficiently. Handling a large number of jobs or a high frequency of builds often requires significant infrastructure and configuration tuning. Jenkins does not inherently support clustering or running jobs in a distributed manner without additional plugins or complex setups. This limitation can lead to performance bottlenecks, especially for organizations with rapid development cycles and large-scale deployments.

4. Security Vulnerabilities

Jenkins has had its share of security vulnerabilities over the years. The extensible nature of Jenkins, primarily through its vast ecosystem of plugins, introduces potential security risks. Plugins can be outdated or poorly maintained, leading to vulnerabilities that can be exploited. While the Jenkins community actively works to address these issues, keeping a Jenkins instance secure requires continuous vigilance and regular updates, adding to the administrative burden.

5. The Rise of Alternatives

The landscape of CI/CD tools has expanded significantly since Jenkins' inception. Modern alternatives like GitHub Actions, GitLab CI/CD, and CircleCI offer more integrated, cloud-native solutions with lower maintenance overheads. These platforms provide more out-of-the-box functionality, better scalability, and improved usability compared to Jenkins. For many teams, especially those starting fresh or looking to simplify their DevOps practices, these alternatives may present a more appealing option.

Conclusion: A Place for Jenkins?

Despite these challenges, Jenkins remains a powerful and versatile tool in the DevOps toolkit. Its open-source nature, extensive plugin ecosystem, and robust community support make it a viable option for many scenarios, especially for organizations with specific needs that can only be met through extensive customization.

The key to successfully leveraging Jenkins lies in understanding its limitations and actively managing its complexities. For teams with existing Jenkins infrastructure, incremental improvements and optimizations can mitigate some of the challenges discussed. For others, evaluating the needs of your projects against the capabilities and costs of maintaining Jenkins versus adopting alternative solutions is crucial.

In conclusion, while Jenkins has its drawbacks, it also offers significant value under the right circumstances. The decision to use Jenkins should be based on a thorough assessment of your organization's specific requirements, resources, and long-term CI/CD strategy.