Iresh's Blog

Azure Authentication Methods Explained: Managed Identity vs Federated Identity vs Certificate, Client Secret

Iresh Ekanayaka — Mon, 05 Jan 2026 11:36:51 GMT

When working with Azure, one of the first (and most confusing) topics you’ll face is authentication.

You’ll often see this priority order mentioned:

Priority	Method	Azure stance
🟢 1	Managed Identity	Strongly recommended
🟢 2	Federated Identity	Modern & preferred
🟡 3	Certificates	Acceptable
🔴 4	Client Secrets	Legacy / fallback

But what do these actually mean?
Why does Azure still support secrets if they’re considered bad?

This article explains each method clearly and practically, so anyone can understand and choose the right one.

Why Authentication Matters in Azure

Authentication answers one simple question:

“Who are you, and are you allowed to access this resource?”

In Azure, authentication is commonly used when:

An app accesses Key Vault
A Function App accesses Storage
A CI/CD pipeline deploys infrastructure
A Kubernetes workload calls an API

Azure supports multiple authentication methods because not all systems are equal — some are modern, some are legacy.

🟢 1. Managed Identity (Strongly Recommended)

What is Managed Identity?

Managed Identity is an identity automatically managed by Azure for Azure resources.

No secrets.
No certificates.
No credentials to store.

Azure handles everything.

How it works (simple)

You enable Managed Identity on an Azure resource (VM, Function App, App Service)
The resource asks Azure for a token
Azure verifies the resource and issues a token
The resource accesses another Azure service

Key benefits

No credentials to leak
Automatic rotation
Zero configuration secrets
Deep Azure integration

When to use it

✅ Azure Function → Key Vault
✅ VM → Storage Account
✅ App Service → SQL Database

Limitation

❌ Works only inside Azure

If both the caller and target are in Azure, Managed Identity should be your first choice.

🟢 2. Federated Identity (Modern & Preferred)

What is Federated Identity?

Federated Identity allows Azure to trust an external identity provider using OIDC (OpenID Connect).

Still no secrets.

How it works

An external platform (GitHub Actions, AKS, Azure DevOps) authenticates itself
It receives a short-lived OIDC token
Azure validates the token (issuer, subject, audience)
Azure issues an access token

Common use cases

GitHub Actions deploying to Azure
Kubernetes workloads accessing Azure services
Cross-cloud automation

Key benefits

Secretless authentication
Short-lived tokens
Ideal for CI/CD and automation

Limitation

❌ Requires an OIDC-capable platform

If your workload runs outside Azure but supports OIDC, use Federated Identity.

🟡 3. Certificate-Based Authentication (Acceptable)

What is Certificate Authentication?

Instead of a password (secret), the app authenticates using a private key and certificate.

This is still OAuth, but stronger than secrets.

How it works

A certificate is uploaded to an Azure App Registration
The app signs a request using the private key
Azure verifies the certificate
Azure issues a token

Why it still exists

Many enterprise systems can’t use OIDC
Certificates are more secure than secrets
Works well for long-running services

Downsides

Certificates expire
Manual lifecycle management
Private key must be protected

Certificates are a solid fallback when federation is not possible.

🔴 4. Client Secrets (Legacy / Fallback)

What is a Client Secret?

A client secret is essentially a password for an application.

How it works

The app sends:

client_id
client_secret
tenant_id

Azure validates the secret and issues a token.

Why secrets are bad

Easy to leak
Often hard-coded
Require manual rotation
High breach risk

Why Azure still supports them

Legacy systems
Backward compatibility
Simple demos and temporary setups

Secrets exist because the real world still has old systems — not because they’re recommended.

Security Comparison at a Glance

Method	Stored credentials	Rotation	Risk level
Managed Identity	None	Automatic	🟢 Very Low
Federated Identity	None	Automatic	🟢 Very Low
Certificate	Private key	Manual	🟡 Medium
Client Secret	Static secret	Manual	🔴 High

How to Choose the Right Method

Ask yourself two questions:

1. Is my workload running inside Azure?

Yes → Use Managed Identity
No → Go to question 2

2. Does it support OIDC?

Yes → Use Federated Identity
No → Use Certificate
Only legacy available → Use Client Secret (temporary)

Azure’s Real Direction

Azure’s message is clear:

Secrets are supported, but discouraged

You can see this in:

Short secret expiry times
Security warnings
Defender recommendations
Strong push toward Managed & Federated Identity

Final Thoughts

If you remember just one thing, remember this:

The closer the identity is to the platform, the safer it is

Azure-native → Managed Identity
Modern external → Federated Identity
Enterprise legacy → Certificates
Last resort → Client Secrets

If you’re migrating from client secrets to federated identity or want help applying this to Terraform, GitHub Actions, AKS, or Azure Functions, feel free to reach out or comment.

Happy building 🚀

🚀 Deploying Multiple Schedulers in Kubernetes (with Leader Election Explained)

Iresh Ekanayaka — Mon, 05 Jan 2026 11:33:30 GMT

Kubernetes ships with a default scheduler (default-scheduler) that is responsible for placing pods onto the most suitable nodes. It does this by considering resource availability, taints and tolerations, affinities, and more.

But what if your application has special placement requirements that the default scheduler cannot handle?

👉 That’s where custom schedulers come in. Kubernetes allows you to run multiple schedulers within the same cluster and choose which scheduler should manage which pods.

In this blog, we’ll cover:

Why multiple schedulers are useful
How schedulers are named and configured
Deploying custom schedulers (binary, pod, and deployment methods)
The leader election option for HA setups
How to use your custom scheduler in pods
How to verify scheduling decisions

🔹 Why Multiple Schedulers?

By default, every pod goes through default-scheduler. However:

You may want an application to run only on GPU nodes with additional custom checks.
You may implement a domain-specific algorithm for data locality or cost optimization.
You may test experimental scheduling strategies without impacting the default scheduler.

With multiple schedulers:

Normal workloads → use the default scheduler.
Special workloads → use your custom scheduler.

🔹 Scheduler Names

Each scheduler must have a unique name.

Default scheduler → default-scheduler
Custom schedulers → you define names like my-scheduler, gpu-scheduler, etc.

This name is set in the scheduler configuration file:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-scheduler
    leaderElection:
      leaderElect: false

If no name is specified, Kubernetes defaults to default-scheduler.

🔹 Methods to Deploy a Custom Scheduler

1. Running Scheduler as a Binary

You can download the kube-scheduler binary and run it manually:

kube-scheduler \
  --config=/etc/kubernetes/my-scheduler-config.yaml \
  --kubeconfig=/etc/kubernetes/scheduler.kubeconfig

⚠️ Rarely used in modern kubeadm-based clusters since schedulers usually run as pods.

2. Scheduler as a Pod

You can run the scheduler as a pod inside your cluster.

apiVersion: v1
kind: Pod
metadata:
  name: my-scheduler
  namespace: kube-system
spec:
  containers:
    - name: kube-scheduler
      image: k8s.gcr.io/kube-scheduler:v1.28.0
      command:
        - kube-scheduler
        - --config=/etc/kubernetes/my-scheduler-config.yaml
      volumeMounts:
        - name: config
          mountPath: /etc/kubernetes/
  volumes:
    - name: config
      configMap:
        name: my-scheduler-config

Here, the ConfigMap contains your custom scheduler config.

3. Scheduler as a Deployment

A more scalable and recommended approach is to run the scheduler as a deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-scheduler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      component: my-scheduler
  template:
    metadata:
      labels:
        component: my-scheduler
    spec:
      serviceAccountName: scheduler-sa
      containers:
        - name: kube-scheduler
          image: k8s.gcr.io/kube-scheduler:v1.28.0
          command:
            - kube-scheduler
            - --config=/etc/kubernetes/my-scheduler-config.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/kubernetes/
      volumes:
        - name: config
          configMap:
            name: my-scheduler-config

This way, Kubernetes ensures HA and restart capabilities.

🔹 The `leaderElect` Option Explained

When running schedulers in HA setups (multiple control-plane nodes):

Multiple copies of the same scheduler might be running.
Only one scheduler instance should be active at a time.
The leaderElect: true setting ensures that a leader is elected among them.

Example:

leaderElection:
  leaderElect: true
  resourceName: my-scheduler-lock

Here:

If you run 3 replicas of my-scheduler, only one will be leader.
Others will stay passive until the leader fails.

👉 Always enable leaderElect in production HA clusters.

🔹 Using the Custom Scheduler in a Pod

Once deployed, tell Kubernetes which scheduler to use by adding schedulerName to your pod spec.

apiVersion: v1
kind: Pod
metadata:
  name: my-custom-pod
spec:
  schedulerName: my-scheduler
  containers:
    - name: busybox
      image: busybox
      command: ["sleep", "3600"]

Now, this pod will bypass the default scheduler and use my-scheduler.

🔹 Verifying Which Scheduler Picked the Pod

To confirm scheduling:

Check pod events:

kubectl get events --sort-by=.metadata.creationTimestamp -o wide

You’ll see events like:

Successfully assigned default/my-custom-pod to node1
Source: my-scheduler

Check scheduler logs:

kubectl logs -n kube-system

If your pod stays in Pending, likely the scheduler is misconfigured.

✅ Summary

Kubernetes supports multiple schedulers.
Each scheduler must have a unique name.
You can deploy schedulers as a binary, pod, or deployment.
Use leaderElect: true in HA setups.
Pods can be scheduled by a custom scheduler using schedulerName.
Verify scheduling decisions with events and logs.

By using multiple schedulers, you can extend Kubernetes to meet specialized workload placement needs while keeping the default scheduler for general workloads.

Understanding Static Pods in Kubernetes

Iresh Ekanayaka — Sun, 21 Sep 2025 08:18:57 GMT

In this blog, we’ll explore Static Pods in Kubernetes, their use cases, and how to configure and inspect them.

What Are Static Pods?

Normally, the kubelet relies on the kube-apiserver to get instructions about which pods to run on a node. The API server receives scheduling decisions from the kube-scheduler, and stores the cluster state in ETCD.

But what happens if there’s no API server, no scheduler, no controllers, and no ETCD—in other words, no Kubernetes master at all?

Even in this scenario, the kubelet can still manage pods independently. These pods, created and managed by the kubelet without any control plane components, are called Static Pods.

How Static Pods Work

The kubelet periodically checks a designated directory on the node for pod definition files (manifests).
When it finds a manifest, it creates the pod on the node.
If the pod crashes, the kubelet automatically restarts it.
If the manifest file changes, the kubelet recreates the pod to apply updates.
If the manifest is removed, the pod is automatically deleted.

Important: Static pods only exist at the pod level. You cannot create ReplicaSets, Deployments, or Services using static pod manifests. These objects require other control plane components.

Where Are Static Pods Stored?

The kubelet needs to know the path to the directory containing static pod manifests. There are two ways to locate this:

1. Check in the `kubelet.service` file

The static pod path may be specified directly as the --pod-manifest-path option. To check:

systemctl cat kubelet | grep pod-manifest-path

Example output:

--pod-manifest-path=/etc/kubernetes/manifests

This directory is where you place your pod manifest files.

2. Check in the kubelet config file (`config.yaml` or `kubeconfig.yaml`)

Sometimes the kubelet is configured to use a config file via the --config option. In this file, the path is specified as:

staticPodPath: "/etc/kubernetes/manifests"

Either method will give you the correct directory for placing static pod manifests.
If both exist, the kubelet config file usually takes precedence.

Viewing Static Pods

Once static pods are created, you cannot use kubectl to view them if the kube-apiserver is not running. This is because:

kubectl communicates with the kube-apiserver.
If the cluster has no API server (e.g., in a standalone kubelet scenario), kubectl has no source of truth.

Instead, use Docker (or the container runtime directly) to inspect the running static pods:

docker ps

If the node is part of a cluster with a running kube-apiserver, static pods are mirrored as read-only pods, and you can view them via:

kubectl get pods -n kube-system

You cannot edit or delete static pods via kubectl; changes must be made in the manifest files.

Static Pods vs DaemonSets

Feature	Static Pods	DaemonSets
Managed by	Kubelet directly	DaemonSet controller via API server
Cluster scheduler	Ignored	Ignored
Use case	Deploy control plane components	Run a copy of a pod on all nodes
Editing	Modify manifest files only	Can edit via `kubectl`

Both static pods and DaemonSet pods are ignored by the kube-scheduler.

Use Cases for Static Pods

Static pods are ideal for:

Bootstrapping the Kubernetes control plane itself as pods (e.g., API server, Controller Manager, ETCD).
Running critical node-level services that must exist independently of the control plane.

When using tools like kubeadm, static pods are used to deploy the cluster's control plane. The kubelet monitors these pods, automatically restarting them if they crash.

Summary

Static Pods are created and managed directly by the kubelet.
They are independent of the Kubernetes control plane.
You can find the static pod manifest directory in either the kubelet.service file (--pod-manifest-path) or the kubelet config file (staticPodPath).
Use Docker to inspect static pods if no API server is present.
Use static pods to deploy the Kubernetes control plane itself or critical node-level services.

Static pods provide a simple, reliable way to run essential pods even when the full Kubernetes control plane is not available.

Understanding DaemonSets in Kubernetes: A Beginner-Friendly Guide

Iresh Ekanayaka — Sat, 20 Sep 2025 10:53:31 GMT

When learning Kubernetes, you often work with Deployments and ReplicaSets to ensure your applications run reliably across your cluster. But what if you need one pod per node instead of multiple replicas? That’s where DaemonSets come in.

Let’s break it down.

What is a DaemonSet?

A DaemonSet is a Kubernetes object that ensures exactly one copy of a pod runs on every node in your cluster.

When a new node joins the cluster, the DaemonSet automatically deploys a pod on it.
When a node is removed, the pod running on it is also deleted.

Think of it as a “one pod per node manager.”

Key difference from ReplicaSet:

ReplicaSets focus on running a specified number of pod replicas across the cluster.
DaemonSets focus on ensuring every node has one copy of a pod.

Why Use a DaemonSet?

DaemonSets are perfect for workloads that need to run on every node, such as:

Monitoring agents
- Example: You want to deploy a pod that collects logs or metrics from every node.
- DaemonSet ensures each node automatically gets the monitoring pod.
Cluster networking components
- Example: Solutions like Weave Net require an agent pod on every node to manage network traffic.
Node-level system components
- Example: The kube-proxy component, which handles network rules, can run as a DaemonSet on all nodes.

✅ Tip: DaemonSets save you the trouble of manually adding or removing pods as nodes are added or removed.

How Does a DaemonSet Work?

Before Kubernetes v1.12

Pods were scheduled manually on nodes by setting the nodeName property in the pod specification.
Each pod was “pinned” to a specific node.

From Kubernetes v1.12 Onwards

DaemonSets use the default scheduler along with node affinity rules.
The scheduler automatically decides which pod goes to which node.
You no longer need to manually specify nodes; Kubernetes takes care of it.

Creating a DaemonSet

Creating a DaemonSet is very similar to creating a ReplicaSet. The main difference is the kind:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: monitoring-daemon
spec:
  selector:
    matchLabels:
      app: monitoring
  template:
    metadata:
      labels:
        app: monitoring
    spec:
      containers:
      - name: monitoring-agent
        image: monitoring-agent:lates

Steps to create and manage DaemonSets:

Create the DaemonSet:

kubectl create -f monitoring-daemon.yaml

View all DaemonSets:

kubectl get daemonset

View detailed info about a DaemonSet:

kubectl describe daemonset monitoring-daemon

⚠️ Tip: Ensure the labels in the selector match the labels in the pod template. Otherwise, the DaemonSet won’t manage the pods properly.

Summary: Why DaemonSets Matter

Automatic deployment: Ensures one pod per node without manual intervention.
Perfect for node-level tasks: Monitoring, logging, networking, or system agents.
Integrates with Kubernetes scheduler: Modern DaemonSets use affinity rules to schedule pods efficiently.

Key Takeaways for Learners

DaemonSets = One pod per node
Use cases: kube-proxy, monitoring agents, network agents
Modern scheduling: Uses default scheduler + node affinity
Management commands: kubectl create, kubectl get daemonset, kubectl describe daemonset

DaemonSets may seem like just another Kubernetes object, but they are crucial for maintaining cluster-wide consistency for essential services. Understanding how to deploy and manage them is a key step in mastering Kubernetes.

⚖️ Kubernetes Resource Requests and Limits Explained (with Best Practices)

Iresh Ekanayaka — Sat, 20 Sep 2025 10:06:49 GMT

When running workloads in Kubernetes, one of the most important things to configure is how much CPU and memory a pod can use. Without proper settings, a single greedy pod can starve others, or your nodes may crash under heavy load.

This is where requests and limits come in. Let’s break it down step by step.

🟢 What Are Requests and Limits?

Resource Requests

The minimum amount of CPU and memory a container is guaranteed.
The scheduler uses these values to decide on which node to place the pod.

Example:

  resources:
    requests:
      cpu: "500m"
      memory: "512Mi"

➝ Pod gets at least 0.5 CPU and 512Mi RAM.

Resource Limits

The maximum resources a container can consume.
Prevents one pod from hogging all resources.

Example:

  resources:
    limits:
      cpu: "1"
      memory: "1Gi"

➝ Pod can’t use more than 1 CPU and 1Gi RAM.

🖥️ CPU vs 💾 Memory

CPU

1 CPU = 1 vCPU (AWS), 1 core (Azure/GCP), or 1 hyperthread.
Can specify fractions:
- 100m = 0.1 CPU.
If pod exceeds its CPU limit, it gets throttled (slowed down).

Memory

Units: Mi (Mebibytes), Gi (Gibibytes).
If pod exceeds its memory limit, it is killed (OOMKilled).
Memory cannot be throttled like CPU.

⚖️ Scenarios to Know

No requests, no limits
- Pod can take everything → others may starve. ❌ Not safe.
Limits only (no requests)
- Kubernetes treats request = limit.
- Pod is guaranteed exactly that much.
- Safe, but not flexible.
Requests + Limits set
- Pod always gets its request.
- Can burst up to the limit.
- Balanced, but unused limits may waste resources.
Requests only (no limits) ✅
- Pod guaranteed its request.
- Can use more if node has free capacity.
- Best for most cases, but requires discipline (all pods should set requests).

⚖️ Memory Scenarios to Know

1. No requests, no limits

Pod can take all memory on the node if available → may cause other pods or system processes to OOM. ❌ Not safe.

2. Limits only (no requests)

Kubernetes treats request = limit.
Pod is guaranteed exactly that much memory.
Safe, but can waste memory if pod doesn’t need the full limit.
If pod exceeds limit → killed (OOMKilled).

3. Requests + Limits set

Pod guaranteed its request.
Can use memory up to the limit.
Balanced, but if pod uses more than limit → killed.
Best practice for workloads with predictable memory usage.

4. Requests only (no limits) ✅

Pod guaranteed its request.
Can use more memory if node has free capacity, but if it grows too much → node might run out of memory, and pod or others may be OOMKilled.
Safer than no requests, but requires careful monitoring and discipline.

✅ CPU

Pod is guaranteed its request (minimum CPU it needs).
Can use more CPU if node has spare capacity.
CPU is throttled, so even if a pod uses more, it won’t crash others.
Pros: Flexible, efficient, lets pods burst when resources are available.
Cons: If a pod consumes too much CPU, it may slow down other pods sharing the same node.

⚠️ Memory

Pod is guaranteed its request (minimum memory).
Can use more memory if node has free capacity, but no limit means it can potentially use all memory on the node.
Memory cannot be throttled, so if it grows too much → pod or other pods may be OOMKilled.
Pros: Flexible if memory usage is predictable and you trust all pods to behave.
Cons: Risky if some pods may have memory leaks or high spikes.

💡 Best Practice

Always set requests — ensures pods get guaranteed resources.
Set limits for memory if workload can spike — prevents a single pod from crashing the node.
CPU limits are optional — only needed if you want to prevent noisy neighbors or enforce strict resource isolation.

In short:

✅ CPU: requests only is safe and flexible. ⚠️ Set limits only when necessary → e.g., in multi-tenant clusters or public labs.
⚠️ Memory: requests only is flexible but can be risky → consider setting a reasonable limit.

✅ Use LimitRange at the namespace level to enforce defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-mem-defaults
  namespace: dev
spec:
  limits:
  - default:
      cpu: "1"
      memory: "512Mi"
    defaultRequest:
      cpu: "0.5"
      memory: "256Mi"
    type: Container

This ensures every pod has a baseline, even if developers forget to specify resources.

📊 Visual Flow: How Scheduling Works

Pod created → Scheduler checks requests.
Scheduler finds a node with enough free resources.
Pod is scheduled onto that node.
At runtime:
- If pod exceeds CPU limit → throttled.
- If pod exceeds memory limit → killed.
- If pod stays within requests → always guaranteed that much.

🚀 Conclusion

Requests = Minimum guarantee
Limits = Maximum cap
Always set requests to prevent resource starvation.
Use limits carefully, only when you need strict isolation.

With the right balance, you’ll keep your Kubernetes cluster fair, efficient, and stable.

LimitRange & ResourceQuota

LimitRange (CPU & Memory)

A LimitRange is a namespace-level policy that defines default resource requests and limits for pods and containers if they are not explicitly set. It ensures that no pod in the namespace runs without some resource constraints.

Key points:

Applies at namespace level.
Can define default requests and limits for CPU and memory.
Can define minimum and maximum allowed values for requests and limits.
Only affects new pods created after the LimitRange is applied. Existing pods are not affected.

Example:

apiVersion: v1
kind: LimitRange
metadata:
  name: example-limitrange
  namespace: my-namespace
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 250m
      memory: 256Mi
    max:
      cpu: 1
      memory: 1Gi
    min:
      cpu: 100m
      memory: 128Mi

defaultRequest → scheduler guaranteed resources if pod doesn’t specify.
default → runtime limit if pod doesn’t specify.
max → maximum allowed resource (ceiling).
min → minimum allowed resource (floor).

Explanation:

If a pod does not specify requests/limits, it gets defaultRequest and default.
Pods cannot request more than max or less than min.

✅ Ensures fair resource usage and prevents runaway pods.

ResourceQuota

A ResourceQuota is a namespace-level limit on the total resources that all pods/containers together can consume.

Key points:

Limits the sum of resources used by all pods in the namespace.
Can limit CPU, memory, number of pods, services, persistent volumes, etc.
Prevents a single namespace from consuming all cluster resources.

Example:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: my-namespace
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 4Gi
    limits.cpu: "10"
    limits.memory: 10Gi
    pods: "10"

Explanation:

The sum of CPU requests across all pods ≤ 4 CPU.
The sum of memory requests across all pods ≤ 4 GiB.
The sum of limits across all pods ≤ 10 CPU and 10 GiB memory.
Max 10 pods in this namespace.

✅ Ensures overall resource governance at the namespace level.

⚖️ Summary

Object	Scope	What it controls	Notes
LimitRange	Namespace	Default requests & limits, min/max	Applied to new pods only
ResourceQuota	Namespace	Total resource usage by all pods	Controls aggregate CPU, memory, pod count, etc.

💡 Tip:

Use LimitRange to ensure each pod has same defaults and prevents very small/huge pods.
Use ResourceQuota to prevent a namespace from consuming all cluster resources.

Kubernetes Scheduling: A Complete Guide for Learners

Iresh Ekanayaka — Sat, 20 Sep 2025 07:27:32 GMT

Scheduling in Kubernetes determines which node a Pod will run on. Proper scheduling ensures workloads are efficiently distributed, critical Pods run on dedicated nodes, and cluster resources are optimally used.

Kubernetes provides multiple mechanisms to control Pod placement, from direct assignment to advanced affinity rules and taints. In this guide, we’ll cover everything you need to understand scheduling and how to dedicate nodes for specific Pods.

1. nodeName (Direct Scheduling)

The simplest way to assign a Pod to a node is using the nodeName field.

If nodeName is set: Kubernetes bypasses the scheduler and assigns the Pod directly.
If nodeName is not set: The Pod goes to the scheduler for placement.

Example:

apiVersion: v1
kind: Pod
metadata:
  name: fixed-node-pod
spec:
  nodeName: node01
  containers:
  - name: nginx
    image: nginx

⚠️ If node01 is unavailable, the Pod stays in Pending state.

Limitations of nodeName

Hard binding: Pod is tied to a single node.
No flexibility: If node01 is down or out of resources, Pod stays Pending.
Not scalable: In dynamic clusters (nodes added/removed), nodeName creates operational headaches.

⚠️ Because of these drawbacks, nodeName is rarely used in production (only for testing/debugging).
Instead, we use labels + selectors for flexible scheduling.

2. Scheduler & Binding

When nodeName is not set, the Kubernetes scheduler:

Checks available nodes.
Considers resources, labels, taints, and affinity rules.
Creates a Binding object linking the Pod to a node.

You can inspect the binding:

kubectl get pod fixed-node-pod -o yaml

The spec.nodeName field will appear after scheduling, even if not set manually.

3. Labels & Selectors

Labels: Key-value metadata attached to Kubernetes objects (Pods, Nodes).
Selectors: Queries that match labels.

Used by the scheduler to decide Pod placement.

Example: Node Label

kubectl label nodes node01 disktype=ssd

Example: Pod with nodeSelector

spec:
  nodeSelector:
    disktype: ssd
  containers:
  - name: nginx
    image: nginx

Pod will schedule only on nodes labeled disktype=ssd.

4. Annotations

Key-value metadata that do not affect scheduling.
Useful for storing additional information like URLs, team info, or build data.

Example:

metadata:
  annotations:
    team: devops
    description: "This Pod is for testing purposes"

5. nodeSelector (Scheduling with Labels)

nodeSelector is the simplest way to schedule Pods based on node labels.

You label nodes with key-value pairs.
Pods with nodeSelector can run only on nodes with matching labels.

Example: Label a node

kubectl label nodes node01 disktype=ssd

Example: Pod with nodeSelector

apiVersion: v1
kind: Pod
metadata:
  name: ssd-pod
spec:
  nodeSelector:
    disktype: ssd
  containers:
  - name: nginx
    image: nginx

✅ This Pod will only run on nodes labeled disktype=ssd.

Limitations of nodeSelector

Only supports exact matches (key=value).
Cannot use advanced operators (In, NotIn, Exists).
No way to express preferences (e.g., “prefer SSD nodes but allow others”).
Too basic for complex production use cases.

👉 That’s why Kubernetes introduced Node Affinity, which builds on nodeSelector and adds more powerful scheduling rules.

6. Node Affinity & Anti-Affinity

Node Affinity is an advanced version of Node Selector. It allows:

Logical operators (In, NotIn, Exists)
Soft (preferred) or hard (required) scheduling rules
Anti-affinity to avoid scheduling Pods on nodes already running certain workloads

Node Affinity Rules

Rule	Type	Scheduling Behavior	Execution Behavior
RequiredDuringSchedulingIgnoredDuringExecution	Hard	Pod must schedule on matching node	Ignores node changes after scheduling
PreferredDuringSchedulingIgnoredDuringExecution	Soft	Pod prefers matching node	Ignores node changes after scheduling
RequiredDuringSchedulingRequiredDuringExecution	Hard	Pod must schedule on matching node	Pod is evicted if node stops matching
PreferredDuringSchedulingRequiredDuringExecution	Soft	Pod prefers matching node	Pod is evicted if node stops matching

Example: RequiredDuringSchedulingIgnoredDuringExecution

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd

Pod must schedule on nodes with disktype=ssd.

Example: PreferredDuringSchedulingIgnoredDuringExecution

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
      preference:
        matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd

Pod prefers SSD nodes but can schedule elsewhere if no SSD node is available.

Example: RequiredDuringSchedulingRequiredDuringExecution

Hard rule: Pod must be on a matching node.
Evicted if node stops matching.

affinity:
  nodeAffinity:
    requiredDuringSchedulingRequiredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: dedicated
          operator: In
          values:
          - true

Example: PreferredDuringSchedulingRequiredDuringExecution

Soft rule during scheduling, but Pod is evicted if the node stops matching later.

affinity:
  nodeAffinity:
    preferredDuringSchedulingRequiredDuringExecution:
    - weight: 1
      preference:
        matchExpressions:
        - key: dedicated
          operator: In
          values:
          - true

Pod Anti-Affinity

Avoid scheduling Pods on nodes already running certain Pods.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - frontend
      topologyKey: "kubernetes.io/hostname"

Ensures Pods are spread across nodes, avoiding co-location.

7. Taints & Tolerations

Taints on nodes restrict which Pods can run there.
Tolerations on Pods allow them to ignore taints.

Taint Effects

Effect	New non-tolerating Pods	Existing non-tolerating Pods	Example Use
NoSchedule	❌ Never scheduled	✅ Can keep running	Protect nodes
PreferNoSchedule	⚠️ Avoid scheduling	✅ Can keep running	Prefer other nodes
NoExecute	❌ Never scheduled	❌ Evicted	Critical workloads

Example:

kubectl taint nodes node01 dedicated=teamA:NoSchedule

Pod toleration:

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "teamA"
  effect: "NoSchedule"

Alone, this does not dedicate the node — Pod can still schedule on untainted nodes.

8. Dedicating Nodes: Combining Node Affinity + Taints

Sometimes in Kubernetes, you want to reserve certain nodes for specific workloads, for example:

GPU nodes for ML workloads
High-memory nodes for database Pods
Critical services that should not compete with general workloads

Using only one mechanism (like taints or node affinity) is often not enough:

Taints & Tolerations alone
- A taint marks a node so that only Pods with a matching toleration can be scheduled there.
- Limitation: Pods with that toleration can still be scheduled on other untainted nodes.

✅ Protects nodes from unwanted Pods, but does not guarantee exclusivity.

Node Affinity alone
- Node Affinity guides the scheduler to place Pods on nodes with matching labels.
- Limitation: Other Pods without affinity rules can still be scheduled on those nodes.
- ✅ Useful for preference or required placement, but cannot block others.

The Combined Approach (Recommended)

To truly dedicate a node exclusively:

Taint the node → blocks unwanted Pods.
Add Node Affinity to the Pod → ensures only the intended Pods are scheduled there.

This combination ensures:

Only Pods with the correct toleration + affinity can be scheduled.
All other Pods are prevented from using that node.
Dedicated nodes are reserved for special workloads.

Step-by-Step Example

Step 1: Taint the Node

kubectl taint nodes node01 dedicated=true:NoSchedule

Node node01 now rejects any Pod without the dedicated=true toleration.

Step 2: Pod YAML with Toleration + Node Affinity

apiVersion: v1
kind: Pod
metadata:
  name: dedicated-pod
spec:
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: dedicated
            operator: In
            values:
            - "true"
  containers:
  - name: nginx
    image: nginx

✅ Behavior:

Pod can only schedule on node01.
Other Pods cannot use node01.
Node is effectively dedicated to this workload.

Why Use Both?

Mechanism	What it Controls	Limitation
Taints + Tolerations	Blocks unwanted Pods	Pods with matching tolerations can still go to other nodes
Node Affinity	Guides Pods to specific nodes	Other Pods can still land on the node
Combined	Dedicate nodes exclusively	Ensures only the intended Pods run on that node

Use Cases

GPU Nodes: Only ML workloads with GPU toleration + affinity can schedule there.
High-Memory Nodes: Critical databases get dedicated memory nodes.
Master/Control Nodes: Prevent regular workloads from running there while allowing special system Pods.

This approach is widely used in production clusters where resource isolation and workload predictability are important.

Summery

🧭 Kubernetes Pod Scheduling: From Basics to Dedicated Nodes

In Kubernetes, scheduling decides which node a Pod will run on. By default, the scheduler spreads Pods across the cluster. But in real-world scenarios, we often want more control — for example:

Running database workloads only on SSD-backed nodes.
Ensuring GPU workloads only land on GPU nodes.
Keeping monitoring agents separate from application Pods.

Kubernetes gives us several tools to influence scheduling. Let’s walk through them step by step — starting from the simplest (nodeName) to the most powerful combination (Node Affinity + Taints).

1. 🔹 `nodeName`

The most direct scheduling method. You tell Kubernetes exactly which node to run on.

spec:
  nodeName: worker-1

✅ Advantages

Simple and explicit — Pod always runs on the named node.
Bypasses the scheduler (fast).

⚠️ Limitations

Hard-coded — tied to a single node.
No flexibility — if that node is down or full, Pod won’t run.
Doesn’t scale in production.

👉 Useful only for debugging or extreme edge cases.

2. 🔹 `nodeSelector`

A more flexible alternative. Instead of hardcoding nodes, you use labels.

spec:
  nodeSelector:
    disktype: ssd

✅ Advantages

Easy to target groups of nodes by label.
Scheduler still chooses the best fit within that group.
Simple to set up.

⚠️ Limitations

Only supports exact match (=).
No advanced expressions (OR, NOT, ranges).
Still rigid for complex workloads.

👉 Better than nodeName, but limited for real-world scenarios.

3. 🔹 Node Affinity & Anti-Affinity

Node affinity is a more expressive version of nodeSelector. It allows rules with operators and conditions.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values: ["ssd", "nvme"]

✅ Advantages

Rich operators: In, NotIn, Exists, Gt, Lt.
Supports hard rules (requiredDuringScheduling…) and soft preferences (preferredDuringScheduling…).
Anti-affinity helps avoid placing Pods together (e.g., keep replicas apart).

⚠️ Limitations

Only works at scheduling time → once scheduled, Pods don’t move if nodes change.
Not exclusive → other Pods without affinity can still land on those nodes.
No eviction → scheduler won’t reshuffle Pods later.

👉 Great for preferences and constraints, but not for isolation.

4. 🔹 Taints and Tolerations

While affinity attracts Pods to nodes, taints repel Pods unless they tolerate it.

kubectl taint nodes worker-2 dedicated=db:NoSchedule

Pod toleration:

tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "db"
  effect: "NoSchedule"

✅ Advantages

Protects reserved nodes from general workloads.
Fine-grained control:
- NoSchedule → don’t place Pods.
- PreferNoSchedule → try to avoid.
- NoExecute → evict existing Pods.

⚠️ Limitations

Too permissive — any Pod with the same toleration can run there.
No attraction mechanism — only prevents, doesn’t guide Pods.
Risk of idle nodes if no Pods have tolerations.
Team drift → multiple workloads might add tolerations and unintentionally overload special nodes.

👉 Great for blocking unwanted Pods, but not enough to guarantee dedicated use.

5. 🔹 Dedicating Nodes: Combining Node Affinity + Taints

The best practice for exclusive node usage is to combine both:

Node Affinity → attracts the right Pods.
Taints → repels everything else.

Example:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: workload
          operator: In
          values: ["gpu"]
tolerations:
- key: "workload"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

🎯 Result

GPU workloads land only on GPU nodes.
No other Pods can run there accidentally.
Guarantees dedicated scheduling.

🚀 Scheduling Evolution Flow

nodeName → too rigid.
nodeSelector → basic labels, limited.
Node Affinity → flexible rules, but still shared.
Taints & Tolerations → node-level protection, but not exclusive.
Affinity + Taints → full control, dedicated nodes.

📝 Conclusion

Kubernetes scheduling evolved from hardcoding to flexible rules to node protection. Each method builds on the previous:

Start with labels for organization (nodeSelector).
Use affinity/anti-affinity for advanced rules.
Apply taints/tolerations to protect nodes.
Combine both for dedicated, isolated workloads.

👉 This layered approach gives you fine-grained control over Pod placement while keeping the cluster efficient and reliable.

💡 Pro tip: For production environments, always prefer Affinity + Taints when you want guaranteed workload isolation.

From Waterfall to DevOps: A Beginner’s Guide to Modern Software Delivery

Iresh Ekanayaka — Fri, 15 Aug 2025 09:23:16 GMT

1. Why DevOps Exists

Before we can understand DevOps, we need to know how software development used to work - and why it needed to change.

In the early days, teams often used the Waterfall Model - a structured, phase-by-phase approach where each stage had to be completed before moving to the next.

2. The Waterfall Model

How it works:

Project Planning – Define objectives and timelines.
Requirements Gathering – Collect all the features from users.
Analysis & Design – Plan the system’s structure and appearance.
Development – Write the code.
Testing – Verify the system works.
Deployment – Release the final product.

Advantages:

Clear separation of phases.
Easy to estimate cost and timelines.

Disadvantages:

Users only see the product at the very end.
Changes are hard to manage once a phase is complete.
Bugs found late can cause massive rework.
Long timelines mean business needs may change before release.

3. The Agile Shift

In 2001, 17 software developers created the Agile Manifesto, introducing a faster and more flexible way to deliver software.

How Agile works:

Break projects into small, manageable pieces.
Deliver working features in short increments.
Use cross-functional teams (development, testing, operations, product).
Maintain continuous collaboration with customers.

Benefits:

Faster delivery of usable features.
Easier to adapt to changing requirements.
Constant feedback from end-users.

4. Scrum - An Agile Framework

Scrum is one of the most popular ways to apply Agile principles.

Key concepts:

User Stories – Simple, user-focused requirements.
Product Backlog – All desired features in one list.
Sprints – Short development cycles (2–4 weeks).
Sprint Backlog – Items chosen for the current sprint.
Review & Retrospective – Showcase progress and improve for the next sprint.

Why it works:

Regular delivery of value.
Continuous improvement.
Works well for ongoing feature updates after release.

5. The Gap Between Development and Operations

Even with Agile, a major problem remained:

Developers want to release new features quickly.
Operations teams focus on stability and uptime.

This caused:

Delays from long approval processes.
Hesitation to deploy changes.
Risks of security or performance issues.

6. DevOps - Closing the Gap

DevOps is both a culture and a set of practices that bridges Development and Operations.

Core ideas:

Collaboration from Day 1 – Developers and Ops work together from planning through deployment.
Automation – Continuous Integration and Continuous Deployment (CI/CD) to deliver changes faster.
Shared Responsibility – Everyone is accountable for quality, security, and uptime.

Challenges in adopting DevOps:

Requires a mindset shift across teams.
Traditional approval-heavy processes slow automation.
Tools are important, but culture change matters more.

7. The Big Picture

The journey looks like this:
Waterfall → Agile → Scrum → DevOps

We moved to DevOps because:

Software must be delivered faster.
Quality must be maintained (or improved).
Teams must collaborate instead of working in silos.

DevOps is not a tool - it’s a way of working.

💡 Key Takeaway:
DevOps combines speed, collaboration, and continuous improvement. By breaking down barriers between development and operations, teams can deliver value to users faster - without sacrificing quality or stability.

L4. Understanding Terraform State: Remote Backend and State Locking – A Beginner’s Guide

Iresh Ekanayaka — Fri, 15 Aug 2025 08:45:22 GMT

Terraform is a powerful tool for automating infrastructure as code, but to use it effectively, it's important to understand how it manages the infrastructure it creates - through Terraform state.

This guide explains what a Terraform state file is, why it matters, the challenges of local state management, how to use a remote backend like AWS S3, and how to implement state locking using DynamoDB - all tailored for beginners and students.

🌱 What is Terraform State?

Terraform keeps track of your real-world infrastructure using a file called terraform.tfstate.

Whenever you apply a Terraform configuration, Terraform:

Provisions the resources you defined (e.g., EC2 instances, VPCs, etc.)
Saves metadata about those resources in the state file.

This file is the single source of truth for Terraform. Without it, Terraform wouldn’t know what has already been created, updated, or deleted.

🧠 Why is Terraform State Important?

Imagine this: You create an EC2 instance using Terraform. Later, you want to add a tag. If Terraform doesn't know the instance already exists (because there’s no state file), it will try to create a new instance instead of updating the existing one. ❌

With the state file, Terraform can:

Compare your current config with the real infrastructure.
Determine what to add, change, or delete.
Safely manage updates and ensure idempotency (the ability to run the same script multiple times without unintended changes).

⚠️ The Drawbacks of Local State Files

While the local state file works, it comes with serious issues:

1. Sensitive Data Exposure

The state file can include secrets, such as API keys, passwords, and sensitive resource details. If it's stored on a local machine or pushed to a Git repository, anyone with access can read them.

2. Team Collaboration Challenges

If multiple people are working on the same Terraform project:

Everyone must remember to share or sync the updated state file.
Forgetting to do so causes drift between actual infrastructure and Terraform’s understanding.
Multiple versions of the state file can lead to resource conflicts or duplication.

☁️ Solution: Using Remote Backends

Terraform offers remote backends to solve these problems. One of the most popular backends is AWS S3.

🔧 How Does It Help?

Your state file is stored in S3, not on your local machine.
It’s always up-to-date since Terraform writes to S3 directly when changes are applied.
You can restrict access to the S3 bucket using IAM policies.

✅ Setting Up S3 as a Remote Backend

Create an S3 bucket (manually or using Terraform):

resource "aws_s3_bucket" "tf_backend" {
  bucket = "my-unique-terraform-backend-bucket"
}

Configure your backend in backend.tf:

terraform {
  backend "s3" {
    bucket = "my-unique-terraform-backend-bucket"
    key    = "statefiles/terraform.tfstate"
    region = "us-east-1"
  }
}

Initialize the backend:

terraform init

Terraform will now store the state file in S3 instead of your local disk. 🎉

🔒 What About State Locking?

Here’s a new concern:

What if two people run terraform apply at the same time?

That could cause a race condition, where both changes conflict or overwrite each other.

🚪 Enter State Locking with DynamoDB

Terraform can lock the state file using AWS DynamoDB, ensuring that only one person can modify infrastructure at a time.

🔐 Setting Up DynamoDB for Locking

Create a DynamoDB Table:

resource "aws_dynamodb_table" "tf_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Update your backend.tf to include locking configuration:

terraform {
  backend "s3" {
    bucket         = "my-unique-terraform-backend-bucket"
    key            = "statefiles/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
  }
}

Re-initialize to apply backend changes:

terraform init

Now, Terraform uses the DynamoDB table to lock and unlock the state file safely. Only one terraform apply can run at a time.

🧪 Testing It Out

You can test your setup like this:

Write a simple EC2 instance configuration in main.tf.
Run:

terraform init
terraform apply

Observe that your state is no longer stored locally, but in S3.
Terraform now uses DynamoDB to lock the state during apply operations.

🚀 Summary

Here’s what we’ve covered:

✅ Terraform State tracks your infrastructure.
❌ Local state files pose risks: sensitive data exposure and sync issues.
☁️ Remote backends (like S3) solve these problems.
🔒 State locking with DynamoDB prevents conflicting changes.

This setup is production-grade and a must-know for anyone planning to work in a team or manage infrastructure securely using Terraform.

🧰 Bonus: Useful Terraform Commands

terraform init       # Initializes the backend and providers
terraform plan       # Shows what changes will be made
terraform apply      # Applies the changes
terraform show       # Displays the current state
terraform destroy    # Destroys the resources

Got questions or need help? Leave a comment or explore the official Terraform Backend Docs for more.

Happy coding! 🌍⚙️

🛑 Kubernetes Graceful Shutdown: PreStop Hooks, Probes, and Termination Lifecycle Explained

Iresh Ekanayaka — Thu, 31 Jul 2025 19:40:53 GMT

When deploying applications in Kubernetes, it's not just about running a container and calling it a day. Kubernetes manages the entire pod lifecycle - from startup to health checks to shutdown - and that includes graceful termination.

Whether it's due to a rolling update, node scaling, OOM crash, or even a manual kubectl delete, Kubernetes gives us the tools to shut down pods gracefully-without losing traffic or corrupting data.

In this post, we’ll break down:

What happens during a pod termination
How lifecycle hooks like preStop work
The role of probes in shutdown
How to avoid dropped traffic or forced shutdowns

📌 When Does Kubernetes Terminate Pods?

Pod termination can happen for several reasons:

🚀 Rolling updates
⚖️ Node scaling or maintenance
💥 Crashes or Out-Of-Memory (OOM) errors
🔧 Manual deletion (e.g., kubectl delete pod)
❌ Failing livenessProbe

If your app is still handling requests or doing important work, an instant kill could cause problems. Instead, Kubernetes provides a graceful shutdown mechanism.

🧠 Quick Glossary

Concept	Meaning
`preStop` hook	A lifecycle hook that runs just before the container is stopped
`terminationGracePeriodSeconds`	How long Kubernetes waits before forcefully killing the pod
`readinessProbe`	Marks pod as "ready" to receive traffic
`livenessProbe`	Checks if pod is healthy/alive

🛑 What Happens When You Delete a Pod?

Let's say you run:

kubectl delete pod my-app

Here’s what Kubernetes does step-by-step:

Marks pod as "Terminating"
Removes pod from Service via readiness probe failure - Not Ready (so no new traffic is sent to the pod).
Runs the preStop hook (if defined)
Sends SIGTERM to the container.
Waits for terminationGracePeriodSeconds
If the container is still running after the grace period, sends SIGKILL to force kill.
Deletes the pod

25 (terminationGracePeriodSeconds) = 5 (preStop) + 15 (app shutdown) + 5 (buffer)

🔄 Kubernetes Pod Termination Flow (Step-by-Step)

🔹 Step 1: Termination is Triggered

Termination is initiated by:

kubectl delete pod
Rolling update (Deployment, StatefulSet)
Node scaling or eviction
Failing livenessProbe

📍 The API server receives the delete request and sets deletionTimestamp on the pod.

🔹 Step 2: Pod is Marked as Terminating

The pod is not deleted immediately.

It's marked as Terminating
The controller (e.g., ReplicaSet) may spin up a replacement pod

🔹 Step 3: Kubelet Detects Termination

The Kubelet on the node watches the API server.
It sees the pod is terminating and starts graceful shutdown.

🔹 Step 4: `preStop` Hook Executes

If you’ve defined a preStop hook in your pod spec:

lifecycle:
  preStop:
    exec:
      command: ["/usr/bin/save-state.sh"]

Kubelet executes it using:

exec (run a command in the container)
httpGet (make HTTP call to internal endpoint)
tcpSocket (deprecated)

🕒 The hook must complete before SIGTERM is sent.

🔹 Step 5: SIGTERM Is Sent

After preStop finishes, Kubelet sends a SIGTERM signal to the container.

This gives your app a chance to shut down politely - like:

Closing DB connections
Draining message queues
Finishing current request

⚠️ If your app doesn’t handle SIGTERM, it may be killed before completing shutdown.

🔹 Step 6: Termination Grace Period Countdown

The clock starts ticking based on:

spec:
  terminationGracePeriodSeconds: 30

The total time includes preStop + app shutdown
Default is 30 seconds

🔹 Step 7: SIGKILL If Timeout Expires

If the container is still running after the grace period, Kubernetes sends:

SIGKILL

At this point, the container is forcefully stopped - even if it's still working.

🔹 Step 8: Pod Is Deleted

Once the container stops:

The Kubelet deletes the pod from the node
The API server removes the pod from the cluster state

📣 Wrapping Up

Kubernetes gives you the tools to shut down pods cleanly-but it's up to you to use them right.

By defining a preStop hook, setting a realistic terminationGracePeriodSeconds, and properly using probes, you can:

Avoid dropped connections
Prevent data corruption
Ensure smoother rolling updates

🚀 Understanding Kubernetes Pod Lifecycle with Restaurant Analogy 🍽️

From Startup to Shutdown - Explained Visually with Liveness, Readiness, PreStop & Termination Grace

🏁 1. Pod Starts = Restaurant Opening

🍽️ Restaurant Analogy:

Staff arrives.
Kitchen is being prepped.
The "Open" sign is still OFF.
Customers are NOT allowed in yet.

🔍 Readiness Probe returns ❌

A restaurant with a "Closed" sign, staff inside cooking/prepping.

⚙️ Kubernetes Explanation:

Pod is created.
Containers inside start.
Kubernetes starts checking the readinessProbe.
If readinessProbe fails → Pod is NOT added to Service LoadBalancer.

✅ 2. Pod is Ready = Open to Customers

🍽️ Restaurant Analogy:

Kitchen is ready.
Staff says: "We’re good to go!"
"Open" sign is ON.
Google Maps starts showing your restaurant.
Customers (traffic) start coming in.

✅ Readiness Probe passes

A restaurant with customers entering, kitchen in action.

⚙️ Kubernetes Explanation:

readinessProbe starts returning success.
Pod is added to Service endpoints.
Kubernetes sends traffic to the pod.
The pod is now Ready.

❤️ 3. Staying Healthy = Passing Health Inspections

🍽️ Restaurant Analogy:

Health inspector comes in every 10 minutes.
Checks kitchen, staff, environment.
If staff fainted, kitchen on fire - 🚫 you fail.

Liveness Probe checks every interval

Health inspector checking kitchen hygiene.

⚙️ Kubernetes Explanation:

livenessProbe runs periodically.
If the liveness probe fails:
- Kubernetes kills and restarts the container.
- Useful when your app hangs but doesn’t crash.

🛑 4. Shutdown Begins = Landlord Gives Notice

🍽️ Restaurant Analogy:

Landlord (Kubernetes) says:
"You're shutting down in 60 seconds."

You lock the front door → 🛑 No more new customers.
Waiters finish serving ongoing orders.
Kitchen finishes cooking.
Staff exits gracefully.

This 60s is your terminationGracePeriodSeconds

Restaurant putting up “Closing soon” sign, waiters finishing orders.

⚙️ Kubernetes Explanation:

Kubernetes initiates pod shutdown (e.g., due to kubectl delete pod).
Kubernetes updates the readiness probe status (Not Ready/Terminating).
Waits for terminationGracePeriodSeconds (default: 30s).
Meanwhile:
- Executes preStop hook.
- Stops sending traffic by failing readiness.
- Allows app to clean up (e.g., finish jobs, close DB).

🔒 5. PreStop Hook = Locking the Door

🍽️ Restaurant Analogy:

You run a command: "Lock the front door."
Sign flips to “Closed.”
Waiters: “No new customers allowed.”

preStop is a lifecycle hook that runs BEFORE SIGTERM

Staff removing restaurant from food delivery app before closing.

⚙️ Kubernetes Explanation:

preStop runs BEFORE SIGTERM.
Often used to:
- Unregister from a service discovery system.
- Drain ongoing traffic.
- Notify other systems.

🔚 6. Graceful Exit = Wrap-up

🍽️ Restaurant Analogy:

No new customers.
Kitchen finishes pending dishes.
Staff exits.
Everyone goes home. No force needed.

All done within terminationGracePeriodSeconds ✅

Restaurant empty, lights off, sign = "Closed".

⚙️ Kubernetes Explanation:

Your app finishes cleanup before the grace period ends.
Container exits.
Pod gets removed cleanly.
No data loss. No corruption.

💀 7. Forced Shutdown = Bouncer Kicks You Out

🍽️ Restaurant Analogy:

You took too long to close.
Landlord sends security (SIGKILL).
Everyone kicked out.
Food wasted, customers angry.

❗ SIGKILL is sent when terminationGracePeriodSeconds is exceeded

Angry landlord dragging staff out, customers confused.

⚙️ Kubernetes Explanation:

If your app doesn’t terminate within the grace period:
- Kubernetes sends SIGKILL.
- Immediate stop.
- You can't recover anything.
- This may cause data loss (e.g., half-written files).

L3. Mastering Terraform Modules: Clean, Reusable Infrastructure for DevOps Beginners

Iresh Ekanayaka — Wed, 14 May 2025 17:55:49 GMT

One of the most powerful yet underused features of Terraform is the concept of modules.

If you’re a beginner, student, or new DevOps engineer, you’ve probably written Terraform code in one big file. But what happens when your infrastructure grows? What if your team wants to reuse the same code across environments or projects?

Welcome to the world of Terraform modules - a structured, maintainable, and reusable way to write your infrastructure as code.

Let’s explore this in-depth.

📦 What is a Terraform Module?

A module in Terraform is simply a folder that contains .tf files - your Terraform configurations - which can be reused across projects.

A module is to Terraform what a function is to programming.

Why Use Modules?

✅ Avoid code duplication
✅ Increase reusability across teams and environments
✅ Enable better testing and debugging
✅ Enforce standards across infrastructure
✅ Encourage team collaboration with ownership and modular thinking

🔧 Real-World Analogy

Imagine you're working at a company like amazon.com and the app is a monolithic Java codebase with 1 million+ lines of code. When there's a bug, it's hard to know who wrote what, where to fix it, and how to test it without deploying the entire app.

The fix? Microservices - smaller, decoupled, maintainable codebases.

Similarly in Terraform:

Without modules, your .tf file will grow with:

EC2 Instances
S3 Buckets
VPCs
Lambda Functions
Load Balancers
EKS clusters
...and more.

It becomes impossible to maintain or collaborate on.

That’s why we adopt a modular approach in Terraform - breaking things into small, logical, reusable units.

🛠 Creating a Basic Terraform Module (Step-by-Step)

Let’s build a reusable EC2 instance module.

📁 Folder Structure

terraform-project/
├── main.tf
├── modules/
│   └── ec2_instance/
│       ├── main.tf
│       ├── variables.tf
│       ├── outputs.tf

🔨 Step 1: Inside `modules/ec2_instance/main.tf`

resource "aws_instance" "example" {
  ami           = var.ami_value
  instance_type = var.instance_type_value
  subnet_id     = var.subnet_id_value
}

📥 Step 2: Define Variables in `modules/ec2_instance/variables.tf`

variable "ami_value" {
  description = "AMI ID for the EC2 instance"
}

variable "instance_type_value" {
  description = "EC2 instance type"
}

variable "subnet_id_value" {
  description = "Subnet ID for the instance"
}

📤 Step 3: Output Public IP in `modules/ec2_instance/outputs.tf`

output "public_ip" {
  value = aws_instance.example.public_ip
}

🧪 Step 4: Consume the Module in Root `main.tf`

provider "aws" {
  region = "us-east-1"
}

module "ec2_instance" {
  source              = "./modules/ec2_instance"
  ami_value           = "ami-0c55b159cbfafe1f0"
  instance_type_value = "t2.micro"
  subnet_id_value     = "subnet-0123456789abcdef0"
}

📌 How It Works

When you run:

terraform init
terraform apply

Terraform will:

Load the ec2_instance module
Inject the variables you passed
Create the EC2 instance
Output the public IP

🔁 Why This Is So Powerful

Imagine this scenario:

You have 3 dev teams, each needing EC2 instances with different configurations.

With modules:

You reuse the same module
Just pass different ami, instance_type, and subnet_id
No duplicated code

Additional Benefits:

🧱 Modularity – Break infrastructure into building blocks
🔄 Reusability – Use the same logic across teams
🧪 Testability – Test modules in isolation
🔐 Security – Keep secrets in terraform.tfvars (not in code)
📚 Documentation & Ownership – Clear inputs and outputs make team collaboration easier

🔐 Optional: Use `terraform.tfvars` to Supply Values

Create a terraform.tfvars file in your root directory:

ami_value           = "ami-0c55b159cbfafe1f0"
instance_type_value = "t2.micro"
subnet_id_value     = "subnet-0123456789abcdef0"

Then just run:

terraform apply

Terraform will automatically pick up values from this file.

🗃️ Where Should You Store Modules?

In the same repo (like above)
In a separate GitHub repo (shared across projects)
Use Terraform Registry (like DockerHub for modules)

❗ In production, avoid unknown public modules unless you trust the source.
Companies usually create private module registries in GitHub or Terraform Cloud.

🚀 Pro Tip: Modules Scale With Your Org

Let’s say your team writes modules for:

ec2_instance
s3_bucket
eks_cluster
vpc
alb

Now, internal developers can simply consume those like building blocks - without writing full configurations.

module "s3_bucket" {
  source = "git::https://github.com/org/modules.git//s3"
  bucket_name = "my-app-logs"
}

✅ Summary

Terraform modules are the secret weapon for writing production-grade infrastructure:

Benefit	Description
Modularity	Smaller, manageable components
Reusability	Use same code across environments/teams
Security	Keep sensitive data out of source control
Collaboration	Clear ownership and shared modules
Scalability	Codebase stays clean as infra grows

🔚 Final Thoughts

If you’re serious about mastering Terraform, start thinking in modules today.

Instead of managing huge .tf files, break your infrastructure into clean, reusable, and testable units - just like developers do with microservices.

"Good DevOps is modular. Great DevOps is reusable."

L2. Terraform Providers, Variables, and Project Structuring - A Practical Beginner’s Guide

Iresh Ekanayaka — Wed, 14 May 2025 15:58:22 GMT

Learning Terraform opens the door to managing cloud infrastructure using Infrastructure as Code (IaC). In this guide, we’ll dive into providers, variables, multi-cloud setups, and how to structure a professional Terraform project.

This blog is especially for students, new learners, and junior DevOps engineers who want to build a solid foundation in Terraform - without getting overwhelmed.

🔌 What Is a Provider in Terraform?

A provider in Terraform is a plugin that connects Terraform to your cloud platform (like AWS, Azure, GCP, etc.).

Think of it as the bridge between your Terraform code and the cloud infrastructure you're trying to automate.

🔧 Example: AWS Provider

provider "aws" {
  region = "us-east-1"
}

This block tells Terraform:

Use the AWS provider
Target the us-east-1 region

Terraform will then use this configuration to authenticate and create resources in AWS.

🧠 Tip:

Without a provider block, Terraform won’t know:

Which cloud to talk to
How to authenticate
Where to create infrastructure

☁️ Official, Partner & Community Providers

Terraform supports hundreds of providers, categorized as:

Official Providers – Maintained by HashiCorp
e.g., AWS, Azure, GCP, Kubernetes
Partner Providers – Maintained by vendors (e.g., Oracle, Alibaba)
Community Providers – Maintained by the community (less reliable)

✅ Always check provider activity and documentation at: registry.terraform.io

🌐 Multi-Region and Multi-Cloud Configuration

🔁 Multi-Region Setup (Single Cloud, Multiple Regions)

To deploy resources in multiple AWS regions:

provider "aws" {
  alias  = "use1"
  region = "us-east-1"
}

provider "aws" {
  alias  = "usw2"
  region = "us-west-2"
}

resource "aws_instance" "east" {
  provider      = aws.use1
  ami           = "ami-east"
  instance_type = "t2.micro"
}

resource "aws_instance" "west" {
  provider      = aws.usw2
  ami           = "ami-west"
  instance_type = "t2.micro"
}

🌐 Multi-Cloud Setup (e.g., AWS + Azure)

provider "aws" {
  region = "us-east-1"
}

provider "azurerm" {
  features = {}
  subscription_id = "your-subscription-id"
  client_id       = "your-client-id"
  client_secret   = "your-secret"
  tenant_id       = "your-tenant-id"
}

⚠️ Each provider has a unique name and authentication mechanism. Use the official docs to get syntax and examples.

🧱 Terraform Resources

A resource represents a piece of infrastructure - like an EC2 instance, an S3 bucket, or a virtual machine.

🧩 Example (AWS EC2 Instance):

resource "aws_instance" "example" {
  ami           = var.ami_id
  instance_type = var.instance_type
}

Use documentation to find the correct syntax and resource names (e.g., aws_instance, azurerm_virtual_machine).

💡 Variables in Terraform

Hardcoding values (like AMI IDs, instance types) is bad practice. Use variables to make your Terraform configuration reusable.

🔽 Input Variables (Defined in `variables.tf`)

variable "ami_id" {
  description = "AMI for EC2"
  type        = string
  default     = "ami-0abcd1234"
}

variable "instance_type" {
  description = "Instance type"
  type        = string
  default     = "t2.micro"
}

🔼 Output Variables (Defined in `outputs.tf`)

output "instance_ip" {
  description = "Public IP of the instance"
  value       = aws_instance.example.public_ip
}

🧾 TFVARS: Dynamic Value Management

To dynamically pass values instead of hardcoding them in .tf files, use a terraform.tfvars file:

ami_id        = "ami-07a5b1a7"
instance_type = "t2.medium"

Terraform will automatically load this file during terraform apply. For custom files:

terraform apply -var-file="prod.tfvars"

🧠 Conditional Expressions in Terraform

Use conditionals when you want to assign values based on a variable (like environment type):

🔐 Example: Conditional CIDR Block

cidr_blocks = var.environment == "prod" ? ["10.0.1.0/24"] : ["10.0.2.0/24"]

This logic:

Assigns one subnet range in production
Assigns another for development

✅ Use it for:

Public access controls
Instance types
Resource counts
Tags and naming conventions

🔧 Built-in Functions in Terraform

Terraform provides useful built-in functions to manipulate data:

Function	Description	Example
`length()`	Get the length of a list	`length(var.subnet_ids)`
`upper()`	Convert to uppercase	`upper(var.env)`
`map()`	Create a map from key/value pairs	`map("env", "dev", "tier", "web")`

More at Terraform Function Docs

📁 Recommended Project Structure

To keep things clean and scalable, organize files like this:

terraform-project/
├── main.tf            # Core logic
├── variables.tf       # Input variables
├── outputs.tf         # Output variables
├── provider.tf        # Provider blocks
├── terraform.tfvars   # Variable values

This modular structure:

Improves readability
Makes collaboration easier
Encourages reuse across environments🧪 Key Takeaways

✅ Providers connect Terraform to your cloud
✅ Use aliases for multi-region or multi-cloud setups
✅ Resources define what to deploy
✅ Variables and TFVARS make code reusable
✅ Conditionals and functions make configurations smarter
✅ Project structure matters for maintainability

🎯 Start writing small modules using providers and variables.
🔍 Read official provider docs.
🛠 Practice customizing configurations for different environments.

L1. A Complete Beginner’s Guide to Infrastructure as Code

Iresh Ekanayaka — Wed, 14 May 2025 13:55:38 GMT

🧠 What You'll Learn in This Guide

✅ Understand what Infrastructure as Code (IaC) really means
✅ Learn why Terraform is the top tool in the DevOps toolbox
✅ Install Terraform (even if you can't install software on your laptop!)
✅ Set up AWS authentication
✅ Write your first Terraform script
✅ Deploy your first EC2 instance
✅ Learn about Terraform’s lifecycle commands: init, plan, apply, destroy
✅ Understand Terraform's state file and its importance

🌐 Infrastructure as Code (IaC) – Explained Simply

Instead of manually creating cloud resources using the AWS Console, you write code to define and manage your infrastructure. This code can be versioned, reused, and shared - just like software code.

Example:

Creating 1 S3 bucket? Easy via the AWS console.
Creating 100 S3 buckets? IaC makes it fast, repeatable, and error-free.

Traditional Methods:

❌ Manual creation using the AWS Console - error-prone and repetitive
❌ AWS CLI or SDKs like Python + Boto3 - requires programming knowledge
❌ CloudFormation or ARM Templates - tied to specific cloud providers

💡 So Why Terraform?

Terraform gives us a universal language to define and manage infrastructure across multiple cloud providers - AWS, Azure, GCP, and more - using its own syntax called HCL (HashiCorp Configuration Language).

🔥 Key Benefits:

💥 Multi-cloud support with one tool
🧾 Readable and declarative syntax
🚀 Reusable modules and configurations
📦 Massive community and ecosystem
⚙️ No deep programming knowledge required
🔁 Version-controlled infrastructure (just like Git)

Instead of learning multiple tools like:

AWS CloudFormation
Azure ARM Templates
OpenStack Heat Templates

...just learn Terraform, and it works for all of them.

🛠️ Installing Terraform

You can install Terraform in two ways:

🔹 Method 1: Local Installation

For Windows:

Download the binary from terraform.io
Add it to your system PATH
Use Git Bash or PowerShell (not CMD)

For macOS:

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

For Ubuntu/Linux:

sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update
sudo apt install terraform

🔹 Method 2: GitHub Codespaces (Recommended for Beginners)

Don’t have admin access or using a restricted office laptop?

Use GitHub Codespaces - your browser becomes your Dev environment.

✅ Free: 60 hours/month
🖥️ Environment: 2 CPUs, 4GB RAM
💻 Built-in Visual Studio Code
🌍 Access from any device with a browser

To set it up:

Fork the Repo
Click Code > Codespaces > Create codespace
Add Dev Container Configs for:
- Terraform
- AWS CLI
Rebuild the container - now you’re ready to use Terraform in-browser!

🔐 Setting Up AWS Authentication

Once the AWS CLI is installed, run:

aws configure

You’ll be prompted to enter:

Access Key ID
Secret Access Key
Default region (e.g., us-east-1)
Output format (json, table, or text)

🔒 Tip: Use IAM Users instead of root accounts for better security. You can generate access keys from the AWS IAM Console.

🔑 Important Note: This only allows your terminal (shell) to interact with AWS using the AWS CLI. It does not automatically grant access to Terraform.

The AWS CLI saves credentials in this location:

~/.aws/credentials

Giving Terraform Access to AWS

Once AWS CLI is configured, Terraform can also use those same credentials - if the provider block is correctly written in your .tf file.

Here's the minimal provider configuration:

provider "aws" {
  region = "us-east-1"
}

📝 Writing Your First Terraform Script

Let’s write a basic Terraform configuration to launch an EC2 instance.

Step 1: Create a file called `main.tf`

provider "aws" {
  region = "us-east-1"
}

resource "aws_instance" "example" {
  ami           = "ami-xxxxxxxxxxxxxxxxx" # Replace with valid AMI ID
  instance_type = "t2.micro"
  subnet_id     = "subnet-xxxxxxxxxxxxx"  # Replace with your Subnet ID
  key_name      = "your-key-name"         # Replace with your key pair name
}

💡 Where to find values:

AMI ID: Launch an EC2 instance manually and copy the AMI ID
Subnet ID: Use your default VPC subnet or create a new one
Key Name: Use an existing EC2 key pair or create a new one from the AWS console

🔄 Terraform Lifecycle Commands

Follow these commands in order:

1️⃣ `terraform init`

Initializes the working directory and downloads provider plugins.

terraform init

2️⃣ `terraform plan`

Shows what changes Terraform will make. Think of this as a dry run.

terraform plan

3️⃣ `terraform apply`

Applies the configuration and creates real infrastructure on AWS.

terraform apply

Terraform will ask for confirmation - type yes.

4️⃣ `terraform destroy`

Destroys the infrastructure and avoids incurring AWS charges.

terraform destroy

📂 What is the Terraform State File?

Terraform creates a file called:

terraform.tfstate

This file:

Records all the resources that Terraform created
Tracks the current state of your infrastructure
Is used internally to understand what changes need to be made

🔐 Important:
Manage this file securely - it may contain sensitive information like resource IDs and IPs.

In future guides, you’ll learn:

How to use remote state storage (S3 + DynamoDB)
How to secure and lock state files in team environments
How state integrates with CI/CD pipelines

🔀 Git Branching Strategies Explained

Iresh Ekanayaka — Tue, 06 May 2025 06:38:21 GMT

🚀 Introduction

When you're working on a software project - especially in a team - managing your code effectively is critical. Git and GitHub enable this through branching strategies, but as a beginner, you might be confused by terms like develop, feature, or hotfix, or unsure which strategy fits your project best.

In this post, I’ll break down three popular Git branching strategies - Git Flow, Feature Branch Workflow, and Forking Workflow - explain how they work, and help you choose the right one.

🌳 Branching Strategies Overview

1. Git Flow Workflow

A structured and full-featured strategy for managing larger projects with scheduled releases.

🔑 Main Branches:

master – Stable, production-ready code.
develop – Integrates features; staging area before production.
feature/* – Individual feature development branches.
release/* – Pre-release staging and polishing.
hotfix/* – Urgent fixes for production issues.

🔁 Typical Flow:

master <—— hotfix
   ↑
release
   ↑
develop <—— feature

🧪 Example:

git checkout -b develop master
git checkout -b feature/login develop
Merge feature into develop → create release/* → merge into master and tag → back-merge into develop

✅ Best for: Large teams, scheduled/versioned releases.
❌ Complex for fast-paced or CI/CD-driven environments.

2. Feature Branch Workflow

This lightweight and popular strategy is often mistakenly called "GitHub Flow" - but it's officially known as Feature Branch Workflow. GitHub Flow is a variant of this, optimized for continuous delivery.

🔑 Main Branch:

main or master – Always production-ready
feature/* – Short-lived branches for individual changes

🔁 Typical Flow:

main <—— feature/* (via Pull Request)

🧪 Example:

git checkout -b feature/signup main
Commit and push changes
Open a PR, get it reviewed, and merge into main

✅ Best for: CI/CD, small to medium teams, rapid releases
❌ Lacks pre-production stages like testing or staging unless you set them up in your CI pipeline

3. Forking Workflow

Mostly used in open-source projects where contributors don't have direct write access to the main repository.

🔑 Key Concept:

Contributors fork the main repository
Work is done in their personal clone
Changes are submitted via Pull Request back to the original repo

🔁 Typical Flow:

origin/main <—— Pull Request <—— forked-repo/feature

🧪 Example:

Fork the repo on GitHub
git clone your fork
git checkout -b fix/typo
Push and open a PR to the original repo

✅ Best for: Open-source and external collaborations
❌ Not ideal for internal team projects due to overhead

📘 Real-World Example: To-Do App

Let’s walk through how each strategy could apply to a simple project like a To-Do App.

🔸 Using Git Flow:

Start from master, create develop
Each feature (e.g., "Dark Mode") gets a feature/dark-mode branch from develop
Once ready, create release/1.0.0 from develop
Merge release into master and develop, tag the release
If a crash is found in production, create hotfix/crash-fix from master

🔹 Using Feature Branch Workflow:

Start from main
Create a branch feature/dark-mode directly from main
Work and open a PR
After testing, merge to main and deploy

🌐 Using Forking Workflow:

Contributor forks the main repo
Creates a feature branch in their fork (e.g., feature/dark-mode)
Pushes and opens a PR to the main repo’s main branch

🧠 Summary Table

Strategy	Best For	Main Branches	Pros	Cons
Git Flow	Enterprise apps, versioned releases	`master`, `develop`, `feature/`, `release/`, `hotfix/*`	Structured, scalable	Complex setup
Feature Branch	Startups, CI/CD teams	`main`, `feature/*`	Simple, fast	No staging by default
Forking Workflow	Open-source projects	Forks, PRs to `main`	Secure, external contributions	Slower workflow

🔍 Text-Based Diagrams

Git Flow:

*--------------------*-------------------> master
         \             \           
          \             *------------> hotfix/*
           \
            *-------------------------> release/*
             \
              *--*--*--*--*-----------> develop
                 \  \  \  \
                  \  \  \ *----------> feature/*
                   \  \  *-----------> feature/*
                    \  *-------------> feature/*

Feature Branch:

main
  |\
  | *--> feature/login
  | *--> feature/dark-mode
  | *--> feature/tags
  --> PR merge back to main

Forking:

original-repo/main
        ^
        |
  Pull Request
        |
  forked-repo/feature

✅ Final Tips

Choose a strategy that matches your team’s workflow and delivery model.
Always pull latest changes before creating new branches.
Use clear, consistent branch naming (feature/login-page, hotfix/api-timeout).
Keep commits focused and meaningful.
Use tags for versioning (v1.0.0, v1.0.1).

With the right strategy, your team can work efficiently, reduce conflicts, and release with confidence.

Setting Up a Multi-Node Kubernetes Cluster Using Kubeadm on Ubuntu 24.04 LTS

Iresh Ekanayaka — Mon, 28 Apr 2025 19:45:45 GMT

Kubernetes is a powerful platform for managing containerized workloads, and Kubeadm is a tool for setting up a Kubernetes cluster easily. In this guide, we'll walk you through the process of setting up a multi-node Kubernetes cluster on Ubuntu 24.04 LTS using Kubeadm. The setup is straightforward and includes all the necessary steps for beginners.

Prerequisites

Before you begin, make sure you meet the following requirements:

At least two Ubuntu 24.04 LTS servers with 2GB of RAM and 2 CPU cores.
Network connectivity between the servers.
Root (sudo) access to each server.

Step 1: Update the System and Install Dependencies

Start by updating the system and installing the necessary dependencies on all nodes (both master and worker nodes).

sudo apt update && sudo apt upgrade -y
sudo apt install apt-transport-https curl -y

Step 2: Install and Configure Containerd

Kubernetes uses containerd as the container runtime. Let's install and configure it:

sudo apt install containerd -y

Now configure containerd to use systemd as the cgroup driver:

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

Step 3: Install Kubernetes Components

Next, we'll install the Kubernetes components: kubelet, kubeadm, and kubectl. These are the essential tools for setting up and managing the Kubernetes cluster.

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.33/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.33/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet

Step 4: Disable Swap

Kubernetes requires that swap be disabled for proper node functioning. Disable it by running:

sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Step 5: Load Necessary Kernel Modules

Kubernetes requires certain kernel modules for networking. Load these modules by running the following commands:

sudo modprobe overlay
sudo modprobe br_netfilter

Step 6: Set Required Sysctl Parameters

To ensure proper network communication, set the following sysctl parameters:

cat <


Apply these sysctl settings:
sudo sysctl --system

Step 7: Initialize the Kubernetes Cluster (on the Master Node)
Now that the nodes are configured, it's time to initialize the Kubernetes cluster. Run the following command only on the master node:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

This command will provide a kubeadm join command that you'll use later to join worker nodes to the cluster.
Step 8: Set Up kubeconfig for the User
To interact with the Kubernetes cluster using kubectl, set up the kubeconfig file for your user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Step 9: Install a Network Plugin (Flannel)
Kubernetes requires a network plugin to enable communication between pods across different nodes. We will use Flannel for this purpose. Run the following command on the master node:
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

Step 10: Verify the Installation
To verify that your cluster is up and running, check the status of the nodes and pods:
kubectl get nodes
kubectl get pods --all-namespaces

Step 11: Join Worker Nodes to the Cluster
On the worker nodes, run the kubeadm join command provided by the kubeadm init output on the master node. It will look something like this:
sudo kubeadm join 172.31.19.36:6443 --token 922x9d.v0jn4c8he0s286js --discovery-token-ca-cert-hash sha256:abcd1234...

After running the kubeadm join command on the worker nodes, they will join the cluster.
Step 12: Verify the Cluster
On the master node, run the following command to ensure that all nodes have joined the cluster successfully:
kubectl get nodes

You should see the master and worker nodes listed as Ready.

Conclusion
Congratulations! You have successfully set up a multi-node Kubernetes cluster using Kubeadm on Ubuntu 24.04 LTS. You can now start deploying your applications on this cluster. If you face any issues during the setup, check the logs or consult the Kubernetes documentation for more troubleshooting steps.

🔍 Understanding Kubernetes System Pods After Cluster Initialization
After initializing your Kubernetes cluster with kubeadm, you can run the command:
kubectl get pods -n kube-system

This will list the core system components running in the kube-system namespace. Here's what each of them does:




Pod Name Purpose



calico-kube-controllers Controls the state of Calico network policies and routes. It ensures proper communication between nodes.

calico-node Runs on every node and is responsible for setting up networking, routing, and enforcing network policies.

coredns DNS server for service discovery in the cluster. It allows pods to resolve service names like my-service.default.svc.cluster.local.

etcd A distributed key-value store used as the backing store for all cluster data. Critical for cluster state.

kube-apiserver Exposes the Kubernetes API. It's the front-end and primary control plane component.

kube-controller-manager Manages various controllers (e.g., replication, endpoint, namespace). It ensures the desired state of resources.

kube-proxy Maintains network rules on nodes to allow communication to services. Handles routing traffic to appropriate pods.

kube-scheduler Assigns newly created pods to nodes based on resource availability and scheduling rules.

Pod Name	Purpose
`calico-kube-controllers`	Controls the state of Calico network policies and routes. It ensures proper communication between nodes.
`calico-node`	Runs on every node and is responsible for setting up networking, routing, and enforcing network policies.
`coredns`	DNS server for service discovery in the cluster. It allows pods to resolve service names like `my-service.default.svc.cluster.local`.
`etcd`	A distributed key-value store used as the backing store for all cluster data. Critical for cluster state.
`kube-apiserver`	Exposes the Kubernetes API. It's the front-end and primary control plane component.
`kube-controller-manager`	Manages various controllers (e.g., replication, endpoint, namespace). It ensures the desired state of resources.
`kube-proxy`	Maintains network rules on nodes to allow communication to services. Handles routing traffic to appropriate pods.
`kube-scheduler`	Assigns newly created pods to nodes based on resource availability and scheduling rules.



How to Install KVM on Ubuntu: A Step-by-Step Beginner’s Guide
Iresh Ekanayaka — Mon, 28 Apr 2025 19:34:21 GMT
If you're diving into the world of virtualization on Linux, KVM (Kernel-based Virtual Machine) is one of the best technologies you can use. It’s powerful, free, built right into the Linux kernel, and acts like a Type 1 hypervisor — just like VMware ESXi or Hyper-V.
In this guide, I’ll walk you through installing KVM on Ubuntu 24.04 (Noble Numbat). Whether you're a complete beginner or need a quick refresher, follow along and you'll have a working KVM setup in no time! 🙌

🛠️ Prerequisites
Before you start:

Ubuntu 24.04 installed on your system

Terminal access (Ctrl + Alt + T)

Root privileges (use sudo)



Step 1: Update Ubuntu 📦
First, make sure your system packages are up to date.
Open the terminal and run:
sudo apt update

👉 This ensures you’re pulling the latest available packages.

Step 2: Check if Your System Supports Virtualization ⚙️
1. Check CPU virtualization support:
egrep -c '(vmx|svm)' /proc/cpuinfo


If you get 0 ➔ your CPU doesn’t support virtualization (sorry 😞).

If you get 1 or more ➔ you're good to go! ✅



2. Check KVM support with kvm-ok:
Install cpu-checker if kvm-ok is missing:
sudo apt install cpu-checker

Then, check again:
sudo kvm-ok

You should see a message like:

"KVM acceleration can be used"

✅ If yes, move to the next step!

Step 3: Install KVM and Required Packages 📚
Now install KVM along with the tools you’ll need:
sudo apt install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils -y

This installs:

qemu-kvm ➔ main KVM package

libvirt ➔ to manage VMs

bridge-utils ➔ networking support



Step 4: Authorize Your User to Use KVM 👤
By default, only members of the libvirt and kvm groups can manage VMs.
Add your user to both groups:
sudo adduser $USER libvirt
sudo adduser $USER kvm

Tip:

Replace $USER with your username if needed.

You might need to log out and log back in for the changes to apply.



Step 5: Verify Your Installation 🔎
Use the virsh tool to confirm that KVM and libvirt are working:
sudo virsh list --all


If you get a (mostly empty) list without errors ➔ everything is working! ✅

Also, check the libvirtd service:
sudo systemctl status libvirtd

If it’s not running, start it with:
sudo systemctl enable --now libvirtd


🎯 Create Your First Virtual Machine (VM)
There are two ways to create VMs: using a GUI or command line.

Method 1: Using Virt-Manager GUI 🖥️
First, install Virt Manager:
sudo apt install virt-manager -y

Then start it:
sudo virt-manager

💡 Steps to create a VM:

Click the computer icon ➔ "Create a new VM".

Choose "Local install media (ISO image)".

Browse and select your ISO file (e.g., Ubuntu Server ISO).

Allocate CPU and Memory.

Create and allocate disk space.

Name your VM ➔ Finish.


🚀 Your VM will boot up and you can install the OS normally!

Method 2: Using the Command Line 🖥️💬
Prefer the terminal? Use virt-install:
Example command:
sudo virt-install \
--name ubuntu24-vm \
--description "Ubuntu 24 VM" \
--ram 2048 \
--vcpus 2 \
--disk path=/var/lib/libvirt/images/ubuntu24-vm.qcow2,size=20 \
--cdrom /path/to/ubuntu-24.04-live-server-amd64.iso \
--graphics vnc

👉 Replace /path/to/ubuntu-24.04-live-server-amd64.iso with your actual ISO path.
This will:

Create a 2GB RAM, 2 vCPU VM

Allocate a 20GB disk

Attach the Ubuntu ISO to install

Open the VM console via VNC (or view it through Virt Manager)





🚀 Why Overprovisioning Breaks Kubernetes Autoscaling
Iresh Ekanayaka — Wed, 09 Apr 2025 07:02:54 GMT
Autoscaling is one of the most powerful features in Kubernetes. It promises to help you respond to fluctuating demand without manual intervention - saving costs when traffic is low and scaling automatically when it's high.
But what happens when… it doesn't scale?
Even when traffic increases and users experience degraded performance - the number of pods remains the same.
The culprit?

Overprovisioning.


⚠️ The Problem: No Scaling Despite Load
Imagine this: You’re running a microservice that handles frequent API calls. You run a load test and notice increasing latency and timeouts. But strangely enough, no new pods are being added by the Horizontal Pod Autoscaler (HPA).
You check the metrics and see CPU usage hovering around 20%. Autoscaling is configured to trigger at 50%. So everything looks fine... right?
Not really.
Despite clear signs of stress, Kubernetes doesn’t scale - because the app is requesting way more resources than it actually needs.

🕵️‍♂️ Root Cause: Overprovisioning
In many cases, developers and engineers allocate large CPU and memory limits "just to be safe." It’s a common habit - especially when there’s little visibility into how much an app truly needs.
For example:

You give your service 1 CPU and 1Gi memory, but in reality, it only ever uses ~200m CPU and 300Mi memory.

What happens next?

Kubernetes thinks your pod is underutilized.

HPA sees usage at only 20% of the requested value - so it does nothing.

Meanwhile, the node is overcommitted, and other workloads may also get throttled.

End-users face slower response times… and nobody knows why.



🔧 Real-World Example (Anonymized)
A team was managing a stateless API service handling product metadata. It was configured with:

1 core CPU request

1.5 cores CPU limit

1Gi+ memory


During load testing, it showed 0% CPU usage in HPA metrics, despite experiencing throttling and performance degradation. Why?
Because:

It was actually using around 200m CPU per pod.

The high request value masked this under-utilization.

Kubernetes couldn’t allocate more resources, and HPA never scaled the workload.


✅ The Fix:

Right-sized the pod to use:

256m CPU request

512m CPU limit

~256Mi–768Mi memory range



HPA immediately started picking up the actual usage.

KEDA was added to scale based on requests per minute, not just CPU.


💡 Result: The app handled load better, latency dropped, and resource cost dropped by nearly 60%.

🔍 Common Pitfalls in Kubernetes Autoscaling
❌ Overprovisioning Resources
When request values are too high, autoscalers see artificially low usage and don’t act - even under pressure.
❌ Scaling Only on CPU
CPU isn’t always the best indicator of load, especially for:

I/O-bound workloads

Latency-sensitive apps

APIs with fast but frequent calls


❌ Ignoring Node Throttling
Even if your pod isn't consuming much, an overcommitted node will throttle workloads to protect overall stability.

✅ Best Practices for Smart Autoscaling
1. Right-Size Your Workloads
Start small. Monitor your pods using:
kubectl top pods

Compare actual usage to requested values. If usage is consistently low, reduce the request and limit values.

2. Tune HPA Carefully
Use reasonable thresholds:

Trigger at 50–60% CPU usage

Set minimum and maximum replica ranges based on traffic expectations



3. Add Request-Based Scaling (with KEDA)
HPA works well with CPU/RAM, but doesn’t understand traffic volume.
Use KEDA to scale based on:

Requests per second

Queue length

Custom Observability metrics



4. Avoid Blind Copy-Paste
Don’t reuse resource configs from unrelated services. Each service behaves differently.

5. Watch for Throttling
If you see throttling in your observability tools, it’s time to revisit your requests and limits. Even if HPA isn’t scaling, Kubernetes might still be struggling to allocate what you asked for.

❓ FAQs
Q: Why doesn’t HPA scale even under load?
A: If the app is overprovisioned, CPU usage stays low relative to the request, so the autoscaler won’t trigger.

Q: How do I know what CPU/memory to request?
A: Start with a conservative value (e.g., 200m CPU, 256Mi memory). Monitor usage under normal and peak traffic, and adjust accordingly.

Q: When should I use KEDA?
A: When CPU isn’t a reliable scaling signal. For example, use KEDA to scale on:

HTTP request rate

Queue depth

Event counts

Custom metrics



Q: Can overprovisioning affect other apps?
A: Yes. Especially in shared node pools, it can cause throttling across unrelated services and reduce overall node efficiency.

📌 Final Takeaway

🧠 Overprovisioning doesn’t protect your app - it hides the real load and breaks autoscaling.

Instead, embrace:

Right-sizing

Smart metric selection

Intentional scaling strategies


By tuning your workloads based on reality - not assumptions - you can achieve better performance, more reliable scaling, and major cost savings.



A Beginner's Guide to Kubernetes Event-Driven Autoscaling (KEDA)
Iresh Ekanayaka — Thu, 03 Apr 2025 06:03:00 GMT
Introduction
Kubernetes has revolutionized the way we deploy and manage containerized applications. However, efficiently scaling workloads based on demand remains a challenge. While Kubernetes’ native Horizontal Pod Autoscaler (HPA) scales applications based on CPU and memory usage, it lacks support for external event-based triggers. This is where KEDA (Kubernetes Event-Driven Autoscaler) comes in.
KEDA extends Kubernetes' autoscaling capabilities by enabling event-driven scaling based on external sources like message queues, databases, cloud events, and custom metrics. This guide will help beginners understand KEDA and how to integrate it into their Kubernetes environments.

What is KEDA?
KEDA is an open-source project under the Cloud Native Computing Foundation (CNCF) that provides event-driven autoscaling for Kubernetes workloads. It works alongside Kubernetes’ HPA, allowing workloads to scale based on real-time external events rather than just CPU and memory metrics.
Key Features of KEDA:

Event-Driven Scaling: Scales applications based on external event sources such as Kafka, RabbitMQ, AWS SQS, PostgreSQL, and Azure Event Hubs.

Efficient Resource Utilization: Pods remain at zero when idle, reducing unnecessary resource consumption.

Seamless HPA Integration: Works with Kubernetes' native HPA to ensure efficient autoscaling.

Flexible Scaling Policies: Define how and when your applications should scale using custom triggers.



Why Use KEDA?
KEDA is particularly useful in scenarios where workload demand fluctuates based on external events rather than system resource usage. Here are a few common use cases:

Queue-Based Processing: Scale applications dynamically based on the number of messages in a queue (e.g., RabbitMQ, Azure Queue Storage, AWS SQS).

Database-Driven Scaling: Scale services when new records are inserted into a database.

IoT and Streaming Applications: Automatically scale applications based on incoming IoT data or Kafka events.

Event-Driven Serverless Applications: Run workloads only when events occur, reducing infrastructure costs.


Advantages of KEDA Over CPU/Memory-Based HPA and Cluster Autoscaler
KEDA provides several advantages compared to traditional CPU and memory-based Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA):

Event-Driven Scaling Instead of Just CPU/Memory

HPA only scales based on CPU and memory usage (with limited support for custom metrics).

KEDA allows scaling based on external events such as queue length in Kafka, database row count, HTTP request rate, or even cloud events.



Fine-Grained Control Over Scaling

HPA and CA react only when resource utilization crosses thresholds.

KEDA allows custom triggers (e.g., "Scale up when Kafka queue has more than 100 messages").



Cost Efficiency

With HPA, pods remain active based on CPU/memory, which may lead to unnecessary costs.

KEDA ensures pods run only when needed, reducing costs by automatically scaling down to zero when there are no events.



Faster Response to Workload Spikes

HPA and CA rely on metric scraping, which can delay autoscaling responses.

KEDA reacts to real-time events, ensuring faster pod provisioning and scaling.



Works Alongside HPA and CA

You can still use HPA for CPU/memory-based autoscaling alongside KEDA for event-driven scaling.

Cluster Autoscaler (CA) is still needed to scale up nodes when pods need more capacity.



Ideal for Asynchronous and Batch Workloads

If your application processes messages from a queue (e.g., RabbitMQ, Kafka, Azure Event Hub), KEDA is a perfect fit.

For serverless applications, it provides event-driven scaling without overprovisioning resources.





Installing KEDA
KEDA can be installed using Helm, which simplifies the deployment process.
Step 1: Add the KEDA Helm Repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

Step 2: Install KEDA
helm install keda kedacore/keda --namespace keda --create-namespace

To verify the installation, check if the KEDA pods are running:
kubectl get pods -n keda


Configuring KEDA for Autoscaling
Step 1: Define a TriggerAuthentication
If your external source requires authentication, you need to define a TriggerAuthentication resource.
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: my-trigger-auth
spec:
  secretTargetRef:
    - parameter: connectionString
      name: my-secret
      key: connectionString

Step 2: Create a ScaleObject
A ScaleObject defines the scaling behavior for your workload.
Example: Scaling a Deployment based on a RabbitMQ queue length
apiVersion: keda.sh/v1alpha1
kind: ScaleObject
metadata:
  name: rabbitmq-scaler
spec:
  scaleTargetRef:
    name: my-app
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: rabbitmq
      metadata:
        queueName: my-queue
        mode: QueueLength
        value: "5"
      authenticationRef:
        name: my-trigger-auth

This configuration scales the my-app deployment based on the number of messages in the RabbitMQ queue.

Monitoring KEDA
To check the status of KEDA’s scaling operations:
kubectl get scaledobjects
kubectl get hpa

To view logs:
kubectl logs -f deployment/keda-operator -n keda


Conclusion
KEDA brings event-driven scaling to Kubernetes, making it ideal for workloads that depend on external events. By integrating with various event sources, it ensures that applications scale efficiently and cost-effectively. Whether you’re managing queue-based jobs, database-triggered workloads, or streaming applications, KEDA provides a powerful and flexible solution for scaling in response to real-world demands.
Ready to get started? Install KEDA in your cluster and experiment with different event sources to see how it can optimize your application scaling!



Understanding the Scrum Process: A Beginner's Guide
Iresh Ekanayaka — Tue, 01 Apr 2025 16:47:35 GMT
Scrum is an Agile framework used to develop and deliver products iteratively and incrementally. It helps teams work collaboratively and respond to change effectively. This guide provides an overview of the Scrum process and its key components.

1. Scrum Framework Overview
Scrum follows an iterative approach, breaking down work into fixed time periods called Sprints. Each Sprint results in a potentially shippable product increment. The process involves specific roles, events, and artifacts.
Key Roles in Scrum:

Product Owner: Defines product goals, prioritizes backlog items, and ensures value delivery.

Scrum Master: Facilitates Scrum processes, removes impediments, and ensures adherence to Agile principles.

Development Team: Cross-functional team responsible for delivering increments.



2. Scrum Process Flow
2.1 Product Backlog
The Product Backlog is a prioritized list of features, improvements, and fixes required for the product. The Product Owner maintains and refines this backlog continuously.
2.2 Backlog Refinement (Backlog Grooming)
Backlog Refinement (also known as Backlog Grooming) is an ongoing activity where the team discusses and clarifies backlog items, breaks them into smaller tasks, and estimates their effort.
2.3 Sprint Planning
Sprint Planning is the meeting where the team selects backlog items for the upcoming Sprint. The key outputs of Sprint Planning include:

Sprint Goal: Defines the purpose of the Sprint.

Sprint Backlog: A subset of Product Backlog items selected for the Sprint.

Task Breakdown: Identifying the necessary tasks to complete selected items.


2.4 Sprint Execution
The Sprint is a time-boxed iteration (typically 1-4 weeks) where the team works on Sprint Backlog items. The team holds Daily Standup (Daily Scrum) Meetings to sync progress and discuss roadblocks.
2.5 Increment
By the end of the Sprint, the team delivers a Potentially Shippable Product Increment that meets the Definition of Done (DoD).
2.6 Sprint Review
The Sprint Review is held at the end of the Sprint, where the team demonstrates the completed work to stakeholders and gathers feedback.
2.7 Sprint Retrospective
The Sprint Retrospective follows the Sprint Review and is aimed at process improvement. The team discusses what went well, what didn’t, and how they can improve future Sprints.

3. Summary of Scrum Process Steps

Product Backlog Creation: Product Owner maintains and prioritizes backlog items.

Backlog Refinement: Team refines and estimates backlog items.

Sprint Planning: Team selects work for the Sprint and sets a goal.

Sprint Execution: Development occurs, guided by Daily Standups.

Increment Creation: A potentially shippable product increment is produced.

Sprint Review: Team showcases work and receives feedback.

Sprint Retrospective: The team reflects and improves the process.

Repeat: The process continues in cycles until the product is completed.


Scrum ensures adaptability, continuous improvement, and consistent product delivery. By following these steps, teams can efficiently manage their work and deliver high-value products in an Agile environment.



Terraform Access for Azure
Iresh Ekanayaka — Tue, 01 Apr 2025 16:41:54 GMT
A cheat sheet for configuring authentication and access control in Azure for Terraform users.
1️⃣ Azure Authentication Methods
Before configuring Terraform, understand the authentication options in Azure:
🔹 Managed Identity
✅ Best For:

Azure resources (VMs, Functions, AKS, etc.) needing access to other Azure services.

Secretless authentication within Azure.


✅ Types:

System-assigned → Tied to a single resource.

User-assigned → Reusable across multiple resources.



🔹 Service Principal
✅ Best For:

Automation tools like Terraform, CI/CD (GitHub Actions, Azure DevOps).

Programmatic access to Azure resources.


✅ Authentication Methods:

Client Secret (Password-based) 🔑 → Easy but less secure.

Certificate-based → More secure than secrets.

Federated Identity (OIDC) 🎭 → No secret required (best for GitHub Actions & Kubernetes).



2️⃣ Prerequisites
Before proceeding, ensure you have:
✅ An Azure Subscription
✅ Azure CLI installed (az version to check)
✅ Terraform installed (terraform version to check`)

3️⃣ Authenticate with Azure CLI
Run the following command to log in to Azure:
az login

If you're using a cloud shell, authentication happens automatically.

4️⃣ Create a Service Principal for Terraform
To enable Terraform to authenticate securely, create a Service Principal:
az ad sp create-for-rbac --name "terraform-sp" --role="Contributor" --scopes="/subscriptions/YOUR_SUBSCRIPTION_ID"

This returns JSON with important credentials:
{
  "appId": "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
  "displayName": "terraform-sp",
  "password": "xxxxxxxxxxxx",
  "tenant": "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
}

Save these values securely.

5️⃣ Configure Terraform Provider with Service Principal
Update your provider block in Terraform to use the service principal credentials:
provider "azurerm" {
  features {}
  subscription_id = "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
  tenant_id       = "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
  client_id       = "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
  client_secret   = "xxxxxxxxxxxx"
}

This ensures Terraform authenticates using the service principal directly.

6️⃣ Verify Access with Terraform
To verify if the authentication is properly set up, run:
terraform plan

If authentication is configured correctly, Terraform will display the planned actions for your infrastructure. If there are issues with authentication, Terraform will return an error message.

7️⃣ Next Steps

Use this authentication to deploy Azure resources with Terraform.

Secure your credentials using an Azure Key Vault instead of storing them in Terraform files.

Continue with infrastructure setup (next blog will cover creating an Azure resource).



This keeps it short, structured, and easy to follow. 🚀



Service Principal vs. Managed Identity in Azure: A Quick Guide
Iresh Ekanayaka — Thu, 27 Mar 2025 04:26:37 GMT
When working with Azure, managing authentication and access to resources securely is crucial. Two common approaches for enabling applications and services to authenticate without using user credentials are Service Principals and Managed Identities. Let’s break down their differences and use cases.
🔹 What is a Service Principal?
A Service Principal is an identity created in Azure Active Directory (Azure AD) to authenticate applications or automation processes. It enables fine-grained access control by assigning specific roles and permissions to a non-human identity.
How to Create a Service Principal (CLI Method):
az ad sp create-for-rbac --name "my-app" --role "Contributor" --scopes "/subscriptions/{subscription-id}"

After running the above command, you will receive output containing essential credentials:
{
  "appId": "",
  "password": "",
  "tenant": ""
}


appId → client_id

password → client_secret

tenant → tenant_id


How to Create a Service Principal (Portal Method):

Navigate to Azure Portal → Azure Active Directory.

Select App registrations → New registration.

Provide a name, select the supported account types, and register the application.

Go to Certificates & secrets to generate a client secret.

Assign necessary RBAC roles under Azure subscriptions.


Key Points:
✅ Requires manual management of secrets or certificates.
✅ Can be used for automation, scripts, or CI/CD pipelines.
✅ Supports role-based access control (RBAC).
✅ Needs explicit lifecycle management (creation, rotation, deletion).
🔹 What is a Managed Identity?
A Managed Identity is an Azure feature that eliminates the need for managing credentials. Azure automatically handles authentication when resources (like Virtual Machines, Functions, and App Services) need access to other Azure services.
Types of Managed Identities:

System-assigned – Tied to a single Azure resource and deleted when the resource is deleted.

User-assigned – Created independently and can be assigned to multiple resources.


How to Enable a System-Assigned Managed Identity (CLI Method):
az vm identity assign --resource-group myResourceGroup --name myVM


How to Enable a Managed Identity (Portal Method):

Navigate to Azure Portal → Your Resource (VM, App Service, etc.).

Go to Identity under the settings.

Enable System-assigned or User-assigned identity.

Assign necessary RBAC roles under Azure subscriptions.


Key Points:
✅ No need to manage credentials manually.
✅ Seamless integration with Azure services.
✅ Automatically rotates credentials for security.
✅ System-assigned identities are tied to a specific resource, while user-assigned identities can be shared.
🔹 When to Use Which?




Use Case Service Principal Managed Identity



CI/CD Pipelines ✅ ❌

Cross-Cloud Authentication ✅ ❌

VM to Azure Storage Authentication ❌ ✅

Long-Term Secrets Management ✅ ❌

Automatic Credential Rotation ❌ ✅


🔹 Final Thoughts
Both Service Principals and Managed Identities serve specific purposes. If you need a reusable identity for automation, Service Principals are the way to go. If you want a secure and hassle-free authentication method for Azure resources, Managed Identities are the best choice.
Which one do you use the most? Let me know in the comments! 🚀

Use Case	Service Principal	Managed Identity
CI/CD Pipelines	✅	❌
Cross-Cloud Authentication	✅	❌
VM to Azure Storage Authentication	❌	✅
Long-Term Secrets Management	✅	❌
Automatic Credential Rotation	❌	✅

Iresh's Blog

Azure Authentication Methods Explained: Managed Identity vs Federated Identity vs Certificate, Client Secret

Why Authentication Matters in Azure

🟢 1. Managed Identity (Strongly Recommended)

What is Managed Identity?

How it works (simple)

Key benefits

When to use it

Limitation

🟢 2. Federated Identity (Modern & Preferred)

What is Federated Identity?

How it works

Common use cases

Key benefits

Limitation

🟡 3. Certificate-Based Authentication (Acceptable)

What is Certificate Authentication?

How it works

Why it still exists

Downsides

🔴 4. Client Secrets (Legacy / Fallback)

What is a Client Secret?

How it works

Why secrets are bad

Why Azure still supports them

Security Comparison at a Glance

How to Choose the Right Method

1. Is my workload running inside Azure?

2. Does it support OIDC?

Azure’s Real Direction

Final Thoughts

🚀 Deploying Multiple Schedulers in Kubernetes (with Leader Election Explained)

🔹 Why Multiple Schedulers?

🔹 Scheduler Names

🔹 Methods to Deploy a Custom Scheduler

1. Running Scheduler as a Binary

2. Scheduler as a Pod

3. Scheduler as a Deployment

🔹 The leaderElect Option Explained

🔹 Using the Custom Scheduler in a Pod

🔹 Verifying Which Scheduler Picked the Pod

✅ Summary

Understanding Static Pods in Kubernetes

What Are Static Pods?

How Static Pods Work

Where Are Static Pods Stored?

1. Check in the kubelet.service file

2. Check in the kubelet config file (config.yaml or kubeconfig.yaml)

Viewing Static Pods

Static Pods vs DaemonSets

Use Cases for Static Pods

Summary

Understanding DaemonSets in Kubernetes: A Beginner-Friendly Guide

What is a DaemonSet?

Why Use a DaemonSet?

How Does a DaemonSet Work?

Before Kubernetes v1.12

From Kubernetes v1.12 Onwards

Creating a DaemonSet

Summary: Why DaemonSets Matter

Key Takeaways for Learners

⚖️ Kubernetes Resource Requests and Limits Explained (with Best Practices)

🟢 What Are Requests and Limits?

Resource Requests

Resource Limits

🖥️ CPU vs 💾 Memory

CPU

Memory

⚖️ Scenarios to Know

⚖️ Memory Scenarios to Know

1. No requests, no limits

2. Limits only (no requests)

3. Requests + Limits set

4. Requests only (no limits) ✅

✅ CPU

⚠️ Memory

💡 Best Practice

📊 Visual Flow: How Scheduling Works

🚀 Conclusion

LimitRange & ResourceQuota

🔹 The `leaderElect` Option Explained

1. Check in the `kubelet.service` file

2. Check in the kubelet config file (`config.yaml` or `kubeconfig.yaml`)

1. 🔹 `nodeName`

2. 🔹 `nodeSelector`