2. Programming / Scripting
A. Python Programming
Python is often considered the "Swiss Army Knife" of DevOps due to its readability and powerful libraries.
Basics
Loops & Conditionals: Essential for controlling the flow of automation scripts, such as iterating through a list of servers or checking if a service is running.
Functions: Used to write reusable blocks of code, reducing redundancy in large automation projects.
OOPs (Object-Oriented Programming): Allows you to model real-world infrastructure components (like "Server" or "Database") as objects to build scalable software.
Exception / File Handling: Crucial for making scripts "robust" by handling errors gracefully and interacting with configuration files or logs.
Intermediate
Boto3: This is the official AWS SDK for Python, used to automate AWS services like launching EC2 instances or managing S3 buckets programmatically.
Logging: Moving beyond simple "print" statements to track script behavior and troubleshoot issues in production environments.
Flask: A lightweight web framework used by DevOps engineers to build internal dashboards, REST APIs, or custom webhooks for automation triggers.
B. Shell Scripting
While Python is used for complex logic, Shell Scripting (Bash) is the native language of the Linux terminal and is essential for quick, system-level tasks.
Basics
Automating Backups: Writing scripts to automatically compress and move data to secure storage on a schedule.
Copying, Moving, and Transferring: Mastering commands to manage files across local and remote directories efficiently.
User Management / Automation: Automating the creation of users, setting permissions, and managing SSH keys across multiple servers.
Intermediate
Integration with AWS CLI: Using Shell scripts to wrap around AWS Command Line Interface commands for rapid cloud resource management.
Makefiles: Originally for compiling code, DevOps engineers use Makefiles as a "command runner" to simplify complex multi-step build or deployment processes.
Integration with Other Tools: Using Shell to "glue" different tools together, such as piping the output of a security scanner into a messaging app notification.
This is Section 2: Programming and Scripting. In a mid-to-senior SRE/DevOps role, you are expected to move beyond simple "automation scripts" and into the realm of Software Engineering for Infrastructure.
You aren't just writing scripts; you are building internal tools, CLI utilities, and automated recovery systems that must be as robust as the production application code.
🔹 1. Improved Notes: Engineering the "Glue"
In DevOps, Bash is for system-level operations and "one-liners," while Python (or Go) is for logic-heavy automation and interacting with APIs.
Bash: The System Interface
Idempotency: A senior engineer writes scripts that can be run 10 times and produce the same result. Using
mkdir -por checkingif [ ! -f /file ]is the bare minimum.Shell Options (
setcommands): Essential for predictable behavior.set -e: Exit immediately if a command fails.set -o pipefail: Prevents a pipeline from returning a "success" exit code if a middle command failed.
Streams & Redirection: Mastering
stderr(2) vsstdout(1). In production, you must redirect logs correctly so they can be captured by logging agents like Fluentbit.
Python: The Automation Powerhouse
SDKs vs. APIs: Understanding how to use Boto3 (AWS), Google Cloud Client Libraries, or Kubernetes Python Client.
Data Structures for SRE: Using Dictionaries and Sets for fast lookups when comparing infrastructure states (e.g., comparing "desired tags" vs "current tags").
Concurrency: Using
threadingorasyncioto perform tasks in parallel (e.g., rotating passwords for 100 databases simultaneously).
🔹 2. Interview View (Q&A)
Q1: Why is set -e or set -o pipefail important in a CI/CD pipeline?
Answer: Without
pipefail, if a command likecat config.json | jq .keyfails at thecatstage butjqsucceeds in receiving an empty string, the exit code is0. The pipeline continues with a broken config.pipefailensures the entire line fails, stopping the deployment of a corrupted state.
Q2: How do you handle secrets in a Python script meant for production?
Answer: Never hardcode. I use environment variables or, better yet, a secret manager SDK (AWS Secrets Manager / HashiCorp Vault). I also ensure the script handles the "Secret Not Found" exception gracefully to avoid leaking metadata in logs.
Q3: When would you choose Python over Bash?
Answer: I use Bash for simple wrappers around CLI tools or quick system tasks. I switch to Python when:
The logic requires complex nested loops/conditionals.
I need to parse complex JSON/YAML (Bash with
jqbecomes unreadable quickly).I need to interact with multiple Cloud APIs or Databases.
I need unit tests for the automation logic.
🔹 3. Architecture & Design: Scripting in SRE
The "Sidecar" and "Init-Container" Pattern:
In Kubernetes, your scripts often live as "Init-containers."
Design: A Bash script waits for a Database to be ready before the main app starts.
Trade-off: If the script is too aggressive (no exponential backoff), it can DDoS your own database during a cluster-wide restart.
Scalability Concerns:
A script that works for 10 servers might fail for 1,000 due to:
Rate Limiting: Cloud APIs will throttle your script if it makes too many requests.
Memory Exhaustion: Loading a 5GB log file into a Python list will crash the pod. Use Generators or Iterators to process data line-by-line.
🔹 4. Commands & Configs (The "Scripting Standard")
The "Professional" Bash Header
Every production script should start with this to ensure reliability:
Bash
Python: Robust API Interaction (Retries)
In production, networks flake. Use the tenacity library or a custom retry decorator.
Python
🔹 5. Troubleshooting & Debugging
Common Failure Mode: "It works on my machine" (Environment Mismatch)
The Fix: Always use
#!/usr/bin/env python3instead of#!/usr/bin/python. Use Virtual Environments (venv) or Dockerize your script to bundle dependencies.
Debugging Bash:
Run with
bash -x script.shto see every command as it executes with variables expanded.Use
shellcheck(linter) to catch common bugs like missing quotes around variables (which causes issues with filenames containing spaces).
Debugging Python:
Use
try...except...finallyblocks to ensure resources (like DB connections) are closed even if the script crashes.Utilize the
loggingmodule instead ofprint()to allow for different log levels (DEBUG vs INFO) in production.
🔹 6. Production Best Practices
Fail Fast: Validate all input arguments and environment variables at the very beginning of the script.
Atomic Operations: If a script is renaming files, try to make it atomic. If it fails halfway, it shouldn't leave the system in a "half-migrated" state.
Signal Handling: In SRE, scripts should handle
SIGTERM. If a Kubernetes pod is terminating, your script should catch the signal and finish its current task or exit cleanly.No "Golden Images": Don't rely on a script being pre-installed on a server. Use Configuration Management (Ansible) to deploy the script and its environment.
🔹 Cheat Sheet / Quick Revision
Bash Logic:
[[ $a == $b ]](Modern comparison),$?(Last exit code),2>&1(Merge stderr into stdout).Python Essentials:
os.environ.get()(Safe env access),subprocess.run(check=True)(Running shell commands safely),json.loads()(Parsing API responses).SRE Logic: Is it Idempotent? Does it have Retries? Does it Log properly? Does it handle Secrets safely?
This section transitions from "what DevOps is" to the actual tooling and automation that makes it work. For an SRE, scripting is about moving from manual tasks to repeatable, reliable systems.
🟢 Easy: Scripting Basics
Focus: Syntax and fundamental operations.
What is a "Shebang" (
#!) and why is it important in a script?Context: Explain how the kernel uses it to identify the interpreter (e.g.,
#!/bin/bashvs#!/usr/bin/env python3).
How do you check the exit status of the last executed command in Bash?
Context: Mention
$?and what a0vs. non-zero value represents.
In Python, what is the difference between a List and a Dictionary?
Context: Focus on when you would use each (e.g., a list for a sequence of server names; a dictionary for server metadata like IP and Role).
How do you pass arguments to a script in Bash?
Context: Mention positional parameters like
$1,$2, and$@for all arguments.
🟡 Medium: Logic & System Interaction
Focus: Error handling and data manipulation.
Explain the difference between
softandharderrors in a script.Context: How do you ensure a script stops immediately if a critical command fails? (Mention
set -ein Bash ortry...exceptin Python).
How would you search for a specific string in a file and replace it using only Bash?
Context: The interviewer is looking for
sedorawk. Bonus if you explain "in-place" editing (sed -i).
Why would you use the
subprocessmodule in Python instead ofos.system()?Context: Discuss security (shell injection), capturing output (
stdout/stderr), and better control over execution.
What are "Environment Variables," and how do you access them in both Bash and Python?
Context: Bash:
$VAR_NAME; Python:os.environ.get('VAR_NAME'). Why is it better to use environment variables than hardcoding secrets?
🔴 Hard: Production Engineering & Robustness
Focus: Automation at scale and "Defensive Scripting."
How do you implement "Idempotency" in a script? Give a real-world example.
Context: A script to create a directory or user should not fail if the directory/user already exists. It should check the state first.
Explain the concept of "Exponential Backoff." How would you script this to handle a flaky API?
Context: Instead of retrying every 1 second, you wait 1s, 2s, 4s, 8s... to avoid overwhelming a service that is already struggling.
Why is
set -o pipefailcritical for CI/CD pipelines?Context: In a pipeline like
command1 | command2, withoutpipefail, the exit code is determined only by the last command. Ifcommand1fails, the pipeline might still report success.
Scenario: You need to rotate logs on 100 servers. Would you use a Bash loop with SSH or a Python script using an SDK? Explain your choice.
Context: This tests your understanding of Parallelism vs. Serial execution. Mention that Python (or Go) handles concurrency and complex error reporting across many nodes much better than raw Bash.
💡 Pro-Tip for your Interview
When discussing scripting, always mention Testing.
Example: "I don't just write scripts; I use ShellCheck for my Bash scripts to catch common bugs, and I write basic Unit Tests in Python to ensure my automation logic handles edge cases."
Ready for Section 3 (Networking) questions?
Last updated