Kubernetes Borg/Omega history topic 11: PodDisruptionBudget. Google constantly performs software and hardware maintenance in its datacenters: firmware updates, kernel and image updates, disk repairs, switch updates, battery tests, etc. etc. More and more kinds over time.
-
-
You can safely drain a node with kubetctl drain: https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ …. Node upgrades and the cluster autoscaler in Google Kubernetes Engine (GKE) also respect PodDisruptionBudget. The latter is documented here:https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler …
Show this thread -
Node upgrade behavior is documented here:https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades …
Show this thread -
And more about the Google SRE philosophy behind automation can be found in the SRE book: https://landing.google.com/sre/sre-book/chapters/automation-at-google/ …
Show this thread -
And the Safe Removal Service was also mentioned in Google's "VM Live Migration at Scale" paper in VEE 2018: https://dl.acm.org/citation.cfm?id=3186415 …
Show this thread
End of conversation
New conversation -
-
-
We use PDBs to make our rolling worker node updates reliable. We make sure Postgres masters are on new nodes before terminating the old master pod.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
This Tweet is unavailable.
-
This Tweet is unavailable.
- Show replies
-
-
-
PDB has worked well for us - 3-master openshift runs about 80 pods for control plane + operators + APIs), we have PDB for etcd to prevent quorum loss during rolling machine update (no orchestration, just PDB gating master drain-and-reboots). Simplest possible backpressure wins!
-
Biggest problem though is we still have no drain api - I’m aware of at least 5 node drain implementations (kubectl, cluster machine api, cluster autoscaler, and two homerolled variations I saw while looking at other ecosystem projects). Would be nice to have a simple standard
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.