I’m continually amazed at how many new inventive ways there are to take down a Kubernetes cluster.
Thanks. Do you know what the precise failure mode was? What in docker was starved so much that the node became unhealthy, as opposed to containers being slow to start?
-
-
Best guess is I/O contention caused the “health checker” to kill docker when it timed out responding to docker ps. No way to constrain or reserve I/O in k8s.
-
But an alternate hypothesis is insufficient system memory reservation (default is 100mi) combined with pod memory leak with no resource limit. Or both.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.