🚀 Deploying Multiple Schedulers in Kubernetes (with Leader Election Explained)
Kubernetes ships with a default scheduler (default-scheduler) that is responsible for placing pods onto the most suitable nodes. It does this by considering resource availability, taints and tolerations, affinities, and more.
But what if your application has special placement requirements that the default scheduler cannot handle?
👉 That’s where custom schedulers come in. Kubernetes allows you to run multiple schedulers within the same cluster and choose which scheduler should manage which pods.
In this blog, we’ll cover:
Why multiple schedulers are useful
How schedulers are named and configured
Deploying custom schedulers (binary, pod, and deployment methods)
The leader election option for HA setups
How to use your custom scheduler in pods
How to verify scheduling decisions
🔹 Why Multiple Schedulers?
By default, every pod goes through default-scheduler. However:
You may want an application to run only on GPU nodes with additional custom checks.
You may implement a domain-specific algorithm for data locality or cost optimization.
You may test experimental scheduling strategies without impacting the default scheduler.
With multiple schedulers:
Normal workloads → use the default scheduler.
Special workloads → use your custom scheduler.
🔹 Scheduler Names
Each scheduler must have a unique name.
Default scheduler →
default-schedulerCustom schedulers → you define names like
my-scheduler,gpu-scheduler, etc.
This name is set in the scheduler configuration file:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-scheduler
leaderElection:
leaderElect: false
If no name is specified, Kubernetes defaults to default-scheduler.
🔹 Methods to Deploy a Custom Scheduler
1. Running Scheduler as a Binary
You can download the kube-scheduler binary and run it manually:
kube-scheduler \
--config=/etc/kubernetes/my-scheduler-config.yaml \
--kubeconfig=/etc/kubernetes/scheduler.kubeconfig
⚠️ Rarely used in modern kubeadm-based clusters since schedulers usually run as pods.
2. Scheduler as a Pod
You can run the scheduler as a pod inside your cluster.
apiVersion: v1
kind: Pod
metadata:
name: my-scheduler
namespace: kube-system
spec:
containers:
- name: kube-scheduler
image: k8s.gcr.io/kube-scheduler:v1.28.0
command:
- kube-scheduler
- --config=/etc/kubernetes/my-scheduler-config.yaml
volumeMounts:
- name: config
mountPath: /etc/kubernetes/
volumes:
- name: config
configMap:
name: my-scheduler-config
Here, the ConfigMap contains your custom scheduler config.
3. Scheduler as a Deployment
A more scalable and recommended approach is to run the scheduler as a deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-scheduler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
component: my-scheduler
template:
metadata:
labels:
component: my-scheduler
spec:
serviceAccountName: scheduler-sa
containers:
- name: kube-scheduler
image: k8s.gcr.io/kube-scheduler:v1.28.0
command:
- kube-scheduler
- --config=/etc/kubernetes/my-scheduler-config.yaml
volumeMounts:
- name: config
mountPath: /etc/kubernetes/
volumes:
- name: config
configMap:
name: my-scheduler-config
This way, Kubernetes ensures HA and restart capabilities.
🔹 The leaderElect Option Explained
When running schedulers in HA setups (multiple control-plane nodes):
Multiple copies of the same scheduler might be running.
Only one scheduler instance should be active at a time.
The
leaderElect: truesetting ensures that a leader is elected among them.
Example:
leaderElection:
leaderElect: true
resourceName: my-scheduler-lock
Here:
If you run 3 replicas of
my-scheduler, only one will be leader.Others will stay passive until the leader fails.
👉 Always enable leaderElect in production HA clusters.
🔹 Using the Custom Scheduler in a Pod
Once deployed, tell Kubernetes which scheduler to use by adding schedulerName to your pod spec.
apiVersion: v1
kind: Pod
metadata:
name: my-custom-pod
spec:
schedulerName: my-scheduler
containers:
- name: busybox
image: busybox
command: ["sleep", "3600"]
Now, this pod will bypass the default scheduler and use my-scheduler.
🔹 Verifying Which Scheduler Picked the Pod
To confirm scheduling:
- Check pod events:
kubectl get events --sort-by=.metadata.creationTimestamp -o wide
You’ll see events like:
Successfully assigned default/my-custom-pod to node1
Source: my-scheduler
- Check scheduler logs:
kubectl logs -n kube-system <my-scheduler-pod-name>
If your pod stays in Pending, likely the scheduler is misconfigured.
✅ Summary
Kubernetes supports multiple schedulers.
Each scheduler must have a unique name.
You can deploy schedulers as a binary, pod, or deployment.
Use
leaderElect: truein HA setups.Pods can be scheduled by a custom scheduler using
schedulerName.Verify scheduling decisions with events and logs.
By using multiple schedulers, you can extend Kubernetes to meet specialized workload placement needs while keeping the default scheduler for general workloads.