Skip to main content

Command Palette

Search for a command to run...

🚀 Deploying Multiple Schedulers in Kubernetes (with Leader Election Explained)

Updated
4 min read

Kubernetes ships with a default scheduler (default-scheduler) that is responsible for placing pods onto the most suitable nodes. It does this by considering resource availability, taints and tolerations, affinities, and more.

But what if your application has special placement requirements that the default scheduler cannot handle?

👉 That’s where custom schedulers come in. Kubernetes allows you to run multiple schedulers within the same cluster and choose which scheduler should manage which pods.

In this blog, we’ll cover:

  • Why multiple schedulers are useful

  • How schedulers are named and configured

  • Deploying custom schedulers (binary, pod, and deployment methods)

  • The leader election option for HA setups

  • How to use your custom scheduler in pods

  • How to verify scheduling decisions


🔹 Why Multiple Schedulers?

By default, every pod goes through default-scheduler. However:

  • You may want an application to run only on GPU nodes with additional custom checks.

  • You may implement a domain-specific algorithm for data locality or cost optimization.

  • You may test experimental scheduling strategies without impacting the default scheduler.

With multiple schedulers:

  • Normal workloads → use the default scheduler.

  • Special workloads → use your custom scheduler.


🔹 Scheduler Names

Each scheduler must have a unique name.

  • Default scheduler → default-scheduler

  • Custom schedulers → you define names like my-scheduler, gpu-scheduler, etc.

This name is set in the scheduler configuration file:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-scheduler
    leaderElection:
      leaderElect: false

If no name is specified, Kubernetes defaults to default-scheduler.


🔹 Methods to Deploy a Custom Scheduler

1. Running Scheduler as a Binary

You can download the kube-scheduler binary and run it manually:

kube-scheduler \
  --config=/etc/kubernetes/my-scheduler-config.yaml \
  --kubeconfig=/etc/kubernetes/scheduler.kubeconfig

⚠️ Rarely used in modern kubeadm-based clusters since schedulers usually run as pods.


2. Scheduler as a Pod

You can run the scheduler as a pod inside your cluster.

apiVersion: v1
kind: Pod
metadata:
  name: my-scheduler
  namespace: kube-system
spec:
  containers:
    - name: kube-scheduler
      image: k8s.gcr.io/kube-scheduler:v1.28.0
      command:
        - kube-scheduler
        - --config=/etc/kubernetes/my-scheduler-config.yaml
      volumeMounts:
        - name: config
          mountPath: /etc/kubernetes/
  volumes:
    - name: config
      configMap:
        name: my-scheduler-config

Here, the ConfigMap contains your custom scheduler config.


3. Scheduler as a Deployment

A more scalable and recommended approach is to run the scheduler as a deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-scheduler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      component: my-scheduler
  template:
    metadata:
      labels:
        component: my-scheduler
    spec:
      serviceAccountName: scheduler-sa
      containers:
        - name: kube-scheduler
          image: k8s.gcr.io/kube-scheduler:v1.28.0
          command:
            - kube-scheduler
            - --config=/etc/kubernetes/my-scheduler-config.yaml
          volumeMounts:
            - name: config
              mountPath: /etc/kubernetes/
      volumes:
        - name: config
          configMap:
            name: my-scheduler-config

This way, Kubernetes ensures HA and restart capabilities.


🔹 The leaderElect Option Explained

When running schedulers in HA setups (multiple control-plane nodes):

  • Multiple copies of the same scheduler might be running.

  • Only one scheduler instance should be active at a time.

  • The leaderElect: true setting ensures that a leader is elected among them.

Example:

leaderElection:
  leaderElect: true
  resourceName: my-scheduler-lock

Here:

  • If you run 3 replicas of my-scheduler, only one will be leader.

  • Others will stay passive until the leader fails.

👉 Always enable leaderElect in production HA clusters.


🔹 Using the Custom Scheduler in a Pod

Once deployed, tell Kubernetes which scheduler to use by adding schedulerName to your pod spec.

apiVersion: v1
kind: Pod
metadata:
  name: my-custom-pod
spec:
  schedulerName: my-scheduler
  containers:
    - name: busybox
      image: busybox
      command: ["sleep", "3600"]

Now, this pod will bypass the default scheduler and use my-scheduler.


🔹 Verifying Which Scheduler Picked the Pod

To confirm scheduling:

  1. Check pod events:
kubectl get events --sort-by=.metadata.creationTimestamp -o wide

You’ll see events like:

Successfully assigned default/my-custom-pod to node1
Source: my-scheduler
  1. Check scheduler logs:
kubectl logs -n kube-system <my-scheduler-pod-name>

If your pod stays in Pending, likely the scheduler is misconfigured.


✅ Summary

  • Kubernetes supports multiple schedulers.

  • Each scheduler must have a unique name.

  • You can deploy schedulers as a binary, pod, or deployment.

  • Use leaderElect: true in HA setups.

  • Pods can be scheduled by a custom scheduler using schedulerName.

  • Verify scheduling decisions with events and logs.

By using multiple schedulers, you can extend Kubernetes to meet specialized workload placement needs while keeping the default scheduler for general workloads.

1 views