Kubernetes Borg/Omega history topic 14: Computational Quality of Service (QoS) and oversubscription. What are they, why would you want them, and how is QoS different than priority? On the last point, it's distinguishing importance and urgency.
-
-
Oversubscription mitigates that by packing more applications onto a system than could all fit at their peak requirements. It's kind of like a bank: not everyone can withdraw at the same time. The question is then: what happens when apps demand more resources than they can get?
Show this thread -
With time-division multiplexing, many CPU threads can be interleaved. They can be blocked and queued by the OS, typically at the cost of context switches and waiting a few time slices. Thus there is no fixed limit to how many can be packed onto a machine. CPU is compressible
Show this thread -
OTOH, swapping memory pages, even to local SSD, is painfully expensive. This is why systems hosting services that need to respond with subsecond latency disable swap. Memory is considered an incompressible resource. For simplicity, I'll ignore resources other than CPU and memory
Show this thread -
Compressible resources like CPU can be made available quickly by the kernel with low impact to the threads that were interrupted, provided it knows which threads urgently need the resources and which ones don't. We call this latency sensitive and latency tolerant respectively
Show this thread -
Borg used an explicit attribute to indicate this, called appclass, which is described by the Borg paper: https://ai.google/research/pubs/pub43438 …. This was translated to scheduling latency in LMCTFY: https://github.com/google/lmctfy/blob/master/include/lmctfy.proto#L142 …. In Kubernetes, it's inferred from resource requests and limits
Show this thread -
In order to reallocate incompressible resources quickly, threads need to be killed, which is obviously not low impact. (For memory, in Linux this is done by the OOM killer.) This was why Borg used priority (production priority vs not) to make memory oversubscription decisions
Show this thread -
Borg's resource reclamation approach is described by the paper: reservations based on observed usage were computed and oversubscribed resources (latency-tolerant cpu and non-production memory) were tallied against reservations whereas guaranteed ones used limits. Complicated.
Show this thread -
Vertical autoscaling (VA) added even more complexity. VA changed limits, but left its own padding to provide slack for reaction time and observation of demand. Ad hoc mechanisms were added to disable limit enforcement for each resource, creating a notion similar to request in K8s
Show this thread -
In K8s, I wanted something simpler, to directly convey the desire for oversubscription and bursting flexibility. The discussion started way back in http://issues.k8s.io/147 and http://issues.k8s.io/168 . The model we settled on was determined by looking at limits and requests
Show this thread -
Request==Limit implies guaranteed resources (not oversubscribed). Request<Limit implies burstable (oversubscribed). Zero request implies best effort. Borg scheduled best effort using reservation, but no throughput guarantees could be made in practice
Show this thread -
This is described in the resource model design (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md …) and the QoS proposal (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md …), including the mapping to OOM scores. The mapping to cgroup cpu shares is described in the pod resource design (https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-resource-management.md …).
Show this thread -
Some work on Vertical Pod Autoscaling for Kubernetes has started: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/autoscaling/vertical-pod-autoscaler.md …. There have been proposals to implement oversubscription also (https://github.com/kubernetes/enhancements/issues/355 …). As for horizontal scaling, resource monitoring infrastructure is a prerequisite
Show this thread -
If managing cluster-level sharing using ResourceQuota and LimitRange, oversubscription can be done at that level also. The original designs were described by https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/admission_control_limit_range.md … and https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/admission_control_resource_quota.md …, with improvements in https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/resource-quota-scoping.md …
Show this thread -
Ok, this topic doesn't fit into a Twitter form factor very well. Maybe some day I'll get around to writing this up more in long form. For now, that's about all I have time for, but questions are welcome
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.