Azure hard questions

Of course. Here are 100 hard interview questions and answers related to the Azure platform, designed for experienced DevOps and SRE professionals. These questions focus on design, troubleshooting, security, and understanding deep technical trade-offs.


## Azure Architecture & Design

### 1. Design a highly available, multi-region architecture for a three-tier web application with a stateful SQL database backend, ensuring automated failover.

The design must handle a full regional outage.

  1. Global Traffic Routing: Use Azure Front Door or Traffic Manager in a priority routing configuration. This will direct all traffic to the primary active region and automatically fail over to the secondary region if the primary becomes unhealthy.

  2. Web and App Tiers: Deploy the web and application tiers as App Services or VMSS in two separate regions. The infrastructure in each region should be identical, managed by Bicep or Terraform.

  3. Database Tier: This is the most critical part for a stateful app. Use Azure SQL Database with an Auto-Failover Group. This feature automatically manages the geo-replication of your database and provides a single read/write listener endpoint. In the event of a primary region outage, it automatically promotes the secondary database to be the new primary and updates the DNS records, allowing your application in the secondary region to connect seamlessly. The application's connection string points to the failover group listener, not the individual database servers.

### 2. What are Azure Landing Zones, and what problem do they solve for large enterprises?

An Azure Landing Zone is a conceptual architecture that provides a pre-configured, scalable, and secure foundation for deploying workloads in Azure. It's not a single service but a combination of infrastructure, governance, and security configurations.

  • Problem Solved: Large enterprises often struggle with inconsistent environments, security gaps, and governance challenges when teams start using Azure independently. A Landing Zone solves this by providing a centrally managed "paved road" that includes:

    • Identity & Access Management: Pre-configured Azure AD integration and RBAC roles.

    • Network Topology: A defined hub-spoke network with centralized firewalls and connectivity.

    • Governance: A hierarchy of Management Groups with Azure Policies enforced for compliance and security.

    • Shared Services: Centralized logging, monitoring, and security services.

### 3. You are designing a solution for a company that needs to process terabytes of data daily. Compare and contrast Azure Synapse Analytics, Azure Databricks, and Azure Data Factory.

  • Azure Data Factory (ADF): An ETL/ELT orchestrator. Its primary job is to move and transform data. You use it to build data pipelines that ingest data from various sources and land it in a destination. It's not a data processing engine itself but can trigger other services to do the work.

  • Azure Databricks: A fast, easy, and collaborative Apache Spark-based analytics platform. It's a powerful engine for large-scale data engineering and machine learning. You would use Databricks for complex data transformations, stream processing, and advanced analytics that ADF can't handle alone.

  • Azure Synapse Analytics: An integrated analytics service that brings together data warehousing, big data analytics, and data integration into a single unified experience. It combines the capabilities of a SQL Data Warehouse with Spark pools (similar to Databricks) and data integration pipelines (similar to ADF).

  • Summary: Use ADF to orchestrate, Databricks for heavy-duty Spark processing and ML, and Synapse when you need a single, unified platform for both data warehousing and big data analytics.

### 4. How would you design a secure, multi-tenant solution on AKS where tenant workloads must be strictly isolated from each other at the network and compute level?

  1. Compute Isolation: Use multiple node pools. Create a dedicated node pool for each tenant. Use Taints on the nodes in each tenant's pool and corresponding Tolerations on their pods to ensure their workloads only run on their dedicated nodes.

  2. Network Isolation: Use Network Policies. Create a default deny-all network policy in each tenant's namespace. Then, create specific policies that only allow traffic within the same namespace and control any necessary cross-namespace or egress traffic.

  3. RBAC and Namespace Isolation: Deploy each tenant into their own dedicated Kubernetes Namespace. Use Kubernetes RBAC to create roles and role bindings that restrict each tenant's users and service accounts to only be able to manage resources within their own namespace.

  4. Resource Quotas: Apply ResourceQuotas and LimitRanges to each namespace to prevent one tenant from consuming all the cluster's resources (CPU, memory, storage).

### 5. Explain the hub-spoke network topology in Azure and the benefits of using it.

A hub-spoke topology is a networking model for organizing and managing Azure infrastructure.

  • Hub VNet: Acts as a central point of connectivity. It contains shared services like Azure Firewall, VPN/ExpressRoute gateways, domain controllers, and monitoring tools.

  • Spoke VNets: These are individual virtual networks that are peered with the hub. They are used to isolate workloads (e.g., one spoke for the web tier, another for the data tier, or one spoke per business unit).

  • Benefits:

    • Cost Savings & Centralization: Shared services are deployed once in the hub instead of in every spoke.

    • Improved Security: All traffic between spokes and to/from the internet can be forced to route through the central firewall in the hub for inspection.

    • Isolation: Workloads in different spokes are isolated from each other by default.

    • Scalability: It's easy to add new spokes without reconfiguring the entire network.

### 6. A company needs to connect their on-premises datacenter to Azure with high bandwidth and low latency for a mission-critical application. Compare and contrast VPN Gateway and ExpressRoute for this scenario.

  • VPN Gateway: Uses the public internet to create an encrypted site-to-site (S2S) tunnel between your on-prem network and an Azure VNet.

    • Pros: Relatively cheap, quick to set up.

    • Cons: Performance and reliability are subject to the public internet, which can be unpredictable. Bandwidth is limited.

  • ExpressRoute: A dedicated, private connection between your on-prem network and the Microsoft network, facilitated by a connectivity provider.

    • Pros: Highly reliable with a financially-backed SLA, provides very high bandwidth (up to 100 Gbps), and offers low, consistent latency. The connection is private and does not traverse the public internet.

    • Cons: Much more expensive and takes longer to provision (weeks or months).

  • Conclusion: For a mission-critical application with high bandwidth and low latency requirements, ExpressRoute is the only suitable choice.


## Advanced Networking & Security

### 7. Explain the technical differences between VNet Service Endpoints and Private Endpoints. When is one definitively better than the other?

  • VNet Service Endpoint:

    • How it works: Extends your VNet's private address space to a specific Azure PaaS service. The PaaS service still has a public IP, but you create a firewall rule on the service to only allow traffic from your VNet's subnet.

    • Routing: Traffic from your VNet to the service takes a more direct route over the Azure backbone.

  • Private Endpoint:

    • How it works: Creates a network interface (NIC) with a private IP address inside your VNet that maps to the PaaS service.

    • Routing: All traffic to the service is directed to this private IP. The service's public endpoint is effectively bypassed.

  • When is a Private Endpoint better? A Private Endpoint is almost always the superior and more secure choice. It is definitively better when:

    1. You need to access the PaaS service from an on-premises network connected via VPN or ExpressRoute. (Service Endpoints don't work from on-prem).

    2. You need to access a PaaS service located in a different Azure region.

    3. You need to enforce Network Security Group (NSG) rules on the traffic to the PaaS service (NSGs work with Private Endpoints, not Service Endpoints).

    4. You need to completely eliminate any public internet exposure for the PaaS service.

### 8. An application in an AKS pod cannot resolve a private DNS name for a database hosted in a peered VNet. Describe your step-by-step troubleshooting process.

  1. Check Pod DNS: exec into the pod (kubectl exec -it <pod-name> -- /bin/sh) and use nslookup db.private.contoso.com. Does it resolve? If not, check the pod's /etc/resolv.conf. It should point to the cluster's CoreDNS service IP.

  2. Check CoreDNS: Check the CoreDNS pod logs (kubectl logs -n kube-system <coredns-pod-name>). See if it's receiving the query and if it's forwarding it or failing.

  3. Check Private DNS Zone Linking: This is the most likely culprit. The Azure Private DNS Zone that hosts private.contoso.com must be linked to both the VNet hosting the database and the VNet hosting the AKS cluster. Go to the Private DNS Zone in the Azure Portal and check the "Virtual network links."

  4. Check NSGs/Firewalls: Ensure that NSGs or an Azure Firewall are not blocking DNS traffic (UDP/TCP port 53) between the AKS VNet and the Azure DNS private IP addresses.

  5. Check VNet Peering: Confirm that the VNet peering is established and connected. Ensure that "Allow forwarded traffic" is enabled on the peering if there's a Network Virtual Appliance (NVA) involved.

### 9. What is the "forced tunneling" concept in Azure networking? How would you implement it?

Forced tunneling is the practice of redirecting all internet-bound traffic from your Azure VNets back to your on-premises datacenter through a site-to-site VPN or ExpressRoute connection. This allows you to inspect and audit all outbound traffic using your existing on-prem security appliances.

  • Implementation:

    1. Establish a VPN or ExpressRoute connection between your on-prem network and a hub VNet in Azure.

    2. Create a Route Table in Azure.

    3. In the Route Table, create a User-Defined Route (UDR) for the address prefix 0.0.0.0/0.

    4. Set the "next hop type" for this route to VirtualNetworkGateway.

    5. Associate this Route Table with all the subnets from which you want to force tunnel traffic.

### 10. Explain the security implications of different outbound types (LoadBalancer vs. UserDefinedRouting) in AKS.

  • LoadBalancer (Default): AKS creates a public Azure Load Balancer for egress. All outbound traffic from the nodes is NAT-ed behind the public IP(s) of this load balancer.

    • Security Implication: This creates a set of predictable, but potentially shared, public IP addresses for your egress traffic. It's simple but offers less control and visibility.

  • UserDefinedRouting (UDR): No public IP is assigned by AKS. The cluster is configured to not create its own egress paths. You are responsible for providing the egress route.

    • Security Implication: This is the most secure and flexible option. It allows you to route all egress traffic through a central Azure Firewall or a third-party Network Virtual Appliance (NVA). This gives you a single point to apply fine-grained firewall rules, perform threat intelligence filtering, and log all outbound connections, providing a much stronger security posture.

### 11. You need to design a solution that provides a single, unified WAF and routing layer for applications hosted across multiple regions and even on-premises. Which Azure service would you use and why?

You would use Azure Front Door.

  • Why it's the right choice:

    1. Global Presence: Front Door is a global service that operates at the edge of Microsoft's network. It can route traffic to any public-facing endpoint, whether it's in Azure, another cloud, or on-premises.

    2. Centralized WAF: It has a built-in, centralized Web Application Firewall (WAF) that protects all of your backends, regardless of where they are hosted. You can manage a single WAF policy that applies globally.

    3. Performance: It uses anycast to route users to the nearest Front Door POP (Point of Presence) and provides performance acceleration features like SSL offloading and TCP connection reuse.

    4. Advanced Routing: It provides sophisticated path-based and host-based routing to direct traffic to the correct backend application.


## Advanced AKS & Containers

### 12. What is the difference between docker build and az acr build? What are the security benefits of the latter?

  • docker build: A local command that uses the Docker daemon on your machine to build an image. The build context (all your source code) is sent to the local daemon.

  • az acr build: A remote command. You send your build context to Azure Container Registry (ACR), and the image is built in the cloud by the ACR Tasks service.

  • Security Benefits of az acr build:

    1. Reduced Attack Surface: You don't need to run a privileged Docker daemon on your CI agents.

    2. Credential Management: The build process can securely access other ACR registries or Azure resources using its managed identity, without you having to manage Docker credentials on the CI agent.

    3. Supply Chain Security: It provides a trusted, auditable build environment in Azure, which is a key component of a secure software supply chain.

### 13. A stateful application in AKS is experiencing I/O performance issues. The pods are using PersistentVolumeClaims backed by Premium SSDs. What are the first things you would investigate?

  1. VM Size and Disk Throttling: The most common issue. The maximum IOPS and throughput of an Azure Disk are constrained by both the disk SKU itself and the VM size of the node it's attached to. A powerful P30 disk attached to a small B-series VM will be throttled by the VM's limits. I would check the VM size of the node pool and compare its disk performance limits with the capabilities of the Premium SSD.

  2. Disk Caching: Check if disk caching is enabled and configured correctly for the workload. For databases, a ReadOnly cache is often recommended for data disks to avoid data corruption, but a ReadWrite cache might be appropriate for other workloads.

  3. Workload Profile: Use monitoring tools (iostat, dstat inside the pod, or Azure Monitor for Containers) to analyze the actual I/O pattern. Is the application doing large sequential writes or small random reads? This will determine if the current disk SKU is appropriate.

  4. Shared Resources: Is the node running other I/O-heavy pods that are competing for the node's total disk bandwidth?

### 14. What is the containerd-shim process, and what role does it play in the Kubernetes container lifecycle?

The containerd-shim is a crucial but often overlooked process. When containerd (the high-level runtime) starts a container, it doesn't directly parent the container's runc process. Instead, it starts a lightweight containerd-shim process for each container. The shim then becomes the parent of the container's process.

  • Role:

    1. Decoupling: This allows the main containerd daemon to be restarted or upgraded without affecting any running containers.

    2. Headless Containers: The shim holds open the STDIO of the container, allowing you to attach or exec later.

    3. Reporting Exit Status: When the container exits, the shim is the process that collects its exit code and reports it back to containerd.

### 15. Explain how you would implement a custom Horizontal Pod Autoscaler (HPA) in AKS based on a custom metric from an Azure service (e.g., the number of messages in an Azure Service Bus queue).

This requires an external metrics adapter.

  1. Deploy KEDA: The easiest and most common way is to deploy KEDA (Kubernetes Event-driven Autoscaling) into your AKS cluster. KEDA is a CNCF project that acts as a Kubernetes metrics adapter.

  2. Create ScaledObject: Instead of creating a standard HPA, you create a ScaledObject custom resource. In this ScaledObject, you specify:

    • The deployment you want to scale.

    • The trigger type (e.g., azure-servicebus).

    • The metadata for the trigger, including the queue name and the connection string (stored securely in a secret).

    • The threshold (e.g., queueLength: 5, meaning scale up when there are 5 or more messages).

  3. How it Works: KEDA will connect to Azure Service Bus, query the queue length, and then expose this as a custom metric that the standard Kubernetes HPA can understand and act upon. KEDA can also scale a deployment down to zero replicas, which a standard HPA cannot.

### 16. What are Pod Security Policies (PSPs) and why have they been deprecated? What is the recommended alternative?

Pod Security Policies (PSPs) were a built-in Kubernetes admission controller used to define a set of conditions that a pod must run with in order to be accepted into the cluster (e.g., not allowing privileged containers).

  • Why they were deprecated: They were notoriously difficult to use correctly. The authorization model was complex, making it hard to roll out a policy without accidentally breaking workloads. It was very easy to lock yourself out of your own cluster.

  • Recommended Alternative: The recommended alternative is to use an external policy admission controller like OPA Gatekeeper or Kyverno. These tools are more powerful, flexible, and have a much more understandable policy authoring and testing experience. Azure also has its own Azure Policy for Kubernetes which integrates with Gatekeeper.

Of course. Here is the final set of 34 hard questions and answers to complete your list of 100, continuing from number 66.


## GitOps & Progressive Delivery

### 67. How would you design a GitOps promotion workflow for an application across Dev, Staging, and Production environments?

This requires separate environment configurations managed in Git.

  1. Repository Structure: Use a dedicated Git repository for your infrastructure and application configurations (the "GitOps repo"). Inside this repo, have separate branches or directories for each environment (e.g., dev, staging, prod).

  2. CI Pipeline (App Repo): When a developer merges code to the main branch of the application repository, the CI pipeline runs tests and builds a new versioned Docker image.

  3. Promotion via Pull Request: After a successful build, the CI pipeline automatically opens a Pull Request against the GitOps repo. This PR proposes a change to the dev environment's configuration, updating the image tag for the application.

  4. GitOps Tool (Argo CD/Flux): The GitOps agent (Argo CD or Flux) running in the Kubernetes cluster is configured to watch the dev branch/directory. Once the PR is approved and merged, the agent detects the change and automatically deploys the new image to the Dev environment.

  5. Staging and Prod Promotion: The promotion from Dev to Staging, and Staging to Prod, is done by opening a new PR that "promotes" the tested and verified image tag from the dev config to the staging config, and later from staging to prod. This creates a fully auditable, Git-based promotion trail.

### 68. What is "drift" in a GitOps context, and how do tools like Argo CD and Flux detect and handle it?

Drift is when the live state of your cluster no longer matches the desired state defined in your Git repository. This can happen due to manual changes (kubectl edit deployment...), out-of-band updates, or misconfigurations.

  • Detection: GitOps tools continuously compare the live state of the resources in the cluster with the manifests in the Git repository. Argo CD's UI will visually flag an application as "OutOfSync" if it detects any drift.

  • Handling:

    • Automated Reconciliation (Self-healing): You can configure the GitOps tool to automatically revert any detected drift. If someone manually scales a deployment, the tool will see the discrepancy and automatically scale it back to the number defined in Git.

    • Manual Sync: Alternatively, you can configure it to only report the drift and require a manual "Sync" action from an operator in the UI or CLI. This is often preferred for critical production environments.

### 69. How does Flagger or a similar progressive delivery tool measure the success of a canary deployment before promoting it?

Flagger integrates with a metrics provider (like Prometheus) to make data-driven decisions.

  1. Metric Templates: You define a set of metric templates that specify the key SLIs (Service Level Indicators) for your application, such as request success rate, request duration (latency), and error rates. These are typically Prometheus queries.

  2. Analysis Configuration: In your Canary custom resource, you define an analysis block. This block references your metric templates and sets specific thresholds (SLOs). For example: "the request success rate must be greater than 99%," and "the 99th percentile latency must be less than 500ms."

  3. Iterative Checks: During the canary rollout, Flagger runs these queries against the metrics from the new (canary) version. It performs these checks at set intervals. If all the metrics stay within the defined thresholds for a specified number of consecutive checks, the canary is considered successful, and Flagger proceeds to promote it. If any check fails, it triggers an immediate rollback.

### 70. What are the security advantages of the GitOps pull model over the traditional push model?

  1. No Cluster Credentials in CI: This is the biggest advantage. In a push model, the CI system (like Jenkins) needs highly privileged, long-lived credentials to connect to the Kubernetes API server. This makes the CI system a huge security target. In the pull model, the GitOps agent inside the cluster initiates all connections, so no external system needs cluster admin credentials.

  2. Least Privilege: The GitOps agent can be configured with a tightly scoped service account that only has the permissions it needs within its designated namespaces.

  3. Auditability and Immutability: Because Git is the single source of truth, you have a complete, immutable audit trail of every single change made to the cluster, including who approved it and when. This makes security audits and incident response much simpler.


## Advanced Azure DevOps & CI/CD Patterns (Continued)

### 71. You need to create a complex pipeline template that has optional stages. How would you implement this in Azure DevOps YAML?

You would use parameters and conditional insertion (if expressions).

  1. Define Parameters: At the top of your template YAML file, define boolean parameters for the optional stages.

    YAML

  2. Conditional Stage: In the stages block, use an if expression that checks the value of the parameter. The stage will only be inserted into the pipeline's execution plan if the condition is true.

    YAML

When a developer uses this template, they can pass runPerformanceTests: true to include the optional stage.

### 72. How would you design a CI pipeline for a monorepo that only builds and tests the projects that have actually changed?

This requires path-based triggers and dynamic job selection.

  1. Path-Based Triggers: In the trigger and pr sections of your azure-pipelines.yml, use paths filters. This ensures the pipeline only runs if files within specific project directories are changed.

    YAML

  2. Dynamic Job Selection (More Advanced): For more complex logic, you can have a single "dispatcher" job that runs first. This job runs a script (git diff --name-only ...) to identify the changed project directories. It then uses a special logging command (##vso[task.setvariable variable=runProjectA;isOutput=true]true) to set output variables. Subsequent jobs can then use a condition based on these variables to decide whether to run or not.

    YAML

### 73. Explain how you can secure a CI/CD pipeline from a "dependency confusion" attack.

A dependency confusion attack tricks your build system into pulling a malicious package from a public repository (like npmjs or PyPI) instead of your intended internal package with the same name.

  • Mitigation Strategy: Use a private artifact repository (like Azure Artifacts or JFrog Artifactory) that acts as a secure proxy.

    1. Single Source of Truth: Configure your build agents to only trust your private repository. The build tool's configuration (.npmrc, .pypirc, settings.xml) must point exclusively to this internal source.

    2. Scoped Packages: For public repositories that support it (like npm), use scopes (e.g., @mycompany/internal-package). You can then configure your build tool to resolve packages under your scope only from your private feed.

    3. Proxy and Caching: The private repository acts as a proxy to public feeds. It caches approved public packages and serves your private packages. This gives you a single, controlled entry point for all dependencies.

### 74. What is a "service hook" in Azure DevOps, and how would you use it to integrate with an external system?

A service hook is a mechanism in Azure DevOps for sending notifications about events to external services. When an event occurs (e.g., "Build completed," "Work item updated," "Pull request created"), the service hook sends an HTTP POST payload with details of the event to a configured URL.

  • Example Use Case: You could create a service hook for the "Code pushed" event and point it to a Slack incoming webhook URL. This would automatically post a message to a Slack channel every time a developer pushes code to your repository, providing real-time visibility.


## SRE Culture & Process (Continued)

### 75. How do you differentiate between an SLO (Service Level Objective) and a KPI (Key Performance Indicator)?

  • A KPI is a business metric that measures how well the business is performing. Examples include Daily Active Users (DAU), conversion rate, or revenue per user.

  • An SLO is a technical reliability target for a specific service, agreed upon with stakeholders. It's a user-centric measure of service health. Examples include availability (uptime), latency (speed), and quality (error rate).

  • Relationship: SLOs are often the underlying technical drivers of KPIs. For example, if your latency SLO is breached (the site is slow), your conversion rate KPI will likely drop. As an SRE, you are directly responsible for the SLOs; the product manager is responsible for the KPIs.

### 76. What is a "blameless culture," and why is it a prerequisite for effective Site Reliability Engineering?

A blameless culture is an environment where incidents and outages are treated as opportunities to learn about systemic weaknesses, not as opportunities to blame individuals.

  • Prerequisite for SRE: SRE is built on the principle of learning from failure. If engineers are afraid they will be punished for mistakes, they will be less likely to:

    • Report near-misses and small issues before they become major outages.

    • Be honest and transparent during postmortems, hiding crucial details.

    • Experiment and innovate, for fear of breaking something.

      Without psychological safety and a blameless culture, you cannot have the open and honest feedback loops required to build and maintain a truly reliable system.

### 77. Explain the concept of "Toil" and describe a process for creating a "Toil Budget."

Toil is the manual, repetitive, automatable, tactical work that scales with the size of the service.

A Toil Budget is a formal process for managing this.

  1. Track Everything: For a period (e.g., one quarter), the team meticulously logs all operational tasks and categorizes them as either engineering work or toil.

  2. Set a Target: The SRE leadership sets a target for the maximum percentage of time the team should spend on toil (e.g., 50%).

  3. Prioritize Automation: The team reviews the logged toil and prioritizes the most time-consuming or frequent tasks for automation. This automation work is treated as first-class engineering work.

  4. Review and Iterate: At the end of the quarter, the team reviews their progress. If they are over their toil budget, it's a clear signal that they must dedicate more time to automation and engineering projects in the next quarter.

### 78. A developer asks you to approve a new service launch, but it has no monitoring or dashboards. How do you respond, following SRE principles?

Following SRE principles, you would not approve the launch. The SRE "Production Readiness Review" (PRR) process dictates that a service is not ready for production unless it meets a minimum bar for reliability and operability.

  • Response: "This service can't be launched yet because it doesn't meet our production readiness criteria. Specifically, it needs to be properly instrumented before we can support it. Let's work together to:

    1. Define the key SLIs (Service Level Indicators) for this service.

    2. Implement the necessary instrumentation (metrics, logs, traces) to measure those SLIs.

    3. Build a basic dashboard in Grafana/Azure Monitor to visualize the health of the service.

    4. Configure alerts based on SLOs so we know when it's misbehaving.

      Once these are in place, we can proceed with the launch."


## Kubernetes Internals & Extensibility (Continued)

### 79. What is a Kubernetes finalizer, and how does it prevent the accidental deletion of critical resources?

A finalizer is a key in a resource's metadata that tells the Kubernetes API server that there is a controller that needs to perform cleanup actions before the resource can be fully deleted.

  • How it works:

    1. When you issue a kubectl delete command, the API server sees the finalizers list on the object.

    2. Instead of deleting the object immediately, it sets a deletionTimestamp on the object and puts it in a "terminating" state.

    3. The controller responsible for that finalizer sees the deletionTimestamp and begins its cleanup logic (e.g., deleting associated cloud resources like a load balancer or a storage volume).

    4. Once the cleanup is complete, the controller removes its finalizer key from the object's metadata.

    5. The API server sees that the finalizers list is now empty and finally deletes the object.

      This prevents scenarios where Kubernetes deletes a PersistentVolumeClaim object, but the underlying cloud disk is left behind, becoming an orphaned (and costly) resource.

### 80. Explain the core concept of the Kubernetes "reconciliation loop" that controllers use.

The reconciliation loop is the fundamental operating principle of all Kubernetes controllers. It's a continuous loop that works to drive the current state of the cluster towards the desired state.

  • The Loop:

    1. Observe: The controller watches the API server for the current state of the resources it manages (e.g., a Deployment controller watches Deployment and Pod objects).

    2. Diff: It compares this current state with the desired state specified in the resource's manifest (e.g., "the Deployment spec says there should be 3 replicas, but I only observe 2 running pods").

    3. Act: If there is a difference, the controller takes action to correct it by making calls to the Kubernetes API (e.g., "create a new Pod to bring the count to 3").

      This loop runs continuously, which is why Kubernetes is so resilient. If a pod dies, the reconciliation loop of the ReplicaSet controller will simply see the difference and create a new one to match the desired state.

### 81. How does the kube-proxy component use iptables (or IPVS) to implement Kubernetes Services?

The kube-proxy is a daemon that runs on every node and is responsible for implementing the virtual IP mechanism for Services.

  • Using iptables (older, more common):

    1. The kube-proxy watches the API server for new Services and Endpoints (the list of IPs for the pods backing a service).

    2. For each Service, it installs a set of iptables rules on the node.

    3. When a packet destined for a Service's virtual ClusterIP arrives at the node, these iptables rules intercept it.

    4. The rules then perform Destination NAT (DNAT), rewriting the destination IP address to be the IP address of one of the healthy backend pods, chosen at random. The packet is then forwarded to the actual pod.

  • Using IPVS (newer, more performant): IPVS (IP Virtual Server) is built on top of the Netfilter framework and acts as an in-kernel load balancer. It's more efficient than iptables at scale because it uses a hash table for lookups instead of a long, sequential list of rules. The principle is the same: kube-proxy programs the IPVS rules to forward traffic from the Service VIP to the backend pod IPs.

The final 19 questions will conclude the set.

Last updated