Kubernetes Borg/Omega history topic 12: A follow-on to the PodDisruptionBudget topic: the descheduler (https://github.com/kubernetes-incubator/descheduler …). Descheduler is more appropriate than the original term "rescheduler", because its job is to decide which pods to kill, not to replace or schedule them
In Borg, the rescheduler was created to defragment nodes to make room. It selected tasks to evict so that the new tasks could schedule, while also ensuring the replacements for the evicted tasks could also find new homes so as not to just cause unnecessary churn
-
-
In K8s, the purpose of the descheduler is mainly to reshuffle pods to improve the overall distribution of pods across nodes. After some churn in a cluster due to pod terminations due to pod autoscaling, pod updates, pods for batch/CI tasks, etc., pod layout can become uneven
Show this thread -
A simple example: Say the cluster autoscaler (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler …) added a new node for new pods. If those pods were due to creation of a new Deployment or ReplicaSet, they could all land on the new node if there weren't enough space on existing nodes
Show this thread -
From the experience in Borg, we knew the descheduler would be needed from the beginning of the Kubernetes project. I think it was first mentioned when discussing the addition of liveness and readiness probes:https://github.com/kubernetes/kubernetes/issues/620#issuecomment-50110653 …
Show this thread -
This enabled us to establish a clear separation of concerns between pod creation and replacement by workload controllers, horizontal scaling by HPA, placement by the scheduler, and rebalancing across nodes and failure domains by the descheduler, which would respect PDB
Show this thread -
That division was discussed when designing eviction for unresponsive nodes (https://github.com/kubernetes/kubernetes/issues/3885#issuecomment-71984989 …) and then in http://issues.k8s.io/12140 . The design docs can be found at https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/rescheduler.md … and https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/rescheduling.md …
Show this thread -
Note that if churn in the cluster is sufficiently high and eviction is highly constrained due to PodDisruptionBudgets, it may not be possible for the descheduler to keep up. This is one reason why it may not be possible to achieve an "optimal" layout
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.