Skip to content
By using Twitter’s services you agree to our Cookies Use. We and our partners operate globally and use cookies, including for analytics, personalisation, and ads.
  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
bgrant0607's profile
Brian Grant
Brian Grant
Brian Grant
@bgrant0607

Tweets

Brian Grant

@bgrant0607

Google Cloud. Kubernetes Steering Committee emeritus, K8s SIG Architecture co-Chair emeritus, CNCF TOC member emeritus

Google
github.com/bgrant0607
Joined June 2014

Tweets

  • © 2020 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Imprint
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    Brian Grant‏ @bgrant0607 31 May 2019
    • Report Tweet
    • Report NetzDG Violation

    Kubernetes Borg/Omega history topic 11: PodDisruptionBudget. Google constantly performs software and hardware maintenance in its datacenters: firmware updates, kernel and image updates, disk repairs, switch updates, battery tests, etc. etc. More and more kinds over time.

    12:05 PM - 31 May 2019
    • 42 Retweets
    • 93 Likes
    • APIZone Anirban Majumdar VOTE! bee42 solutions gmbh Rafael Fernández Tobias Placht BenzairReda Abhinav korpal anthony
    3 replies 42 retweets 93 likes
      1. New conversation
      2. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        Even though Borg tasks are designed to be resilient, this could get pretty disruptive. Rate-limiting maintenance tasks independently isn't efficient if you have dozens of them, and it's not always feasible to perform all types of maintenance at the same time.

        1 reply 0 retweets 7 likes
        Show this thread
      3. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        Even if rate-limiting machine disruptions, it's also possible for the same task to get disrupted over and over again, like a can being kicked down the road. Hence, the Safe Removal Service (aka SRSly -- pronounced seriously) was developed by SRE. (SRE builds lots of automation.)

        1 reply 0 retweets 10 likes
        Show this thread
      4. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        SRSly kept track of how often tasks of the same Borg Job were disrupted (aka evicted). Maintenance automation queried SRSly regarding all tasks scheduled on a machine before taking it out of service. This enabled Borg to provide a SLO on task disruption.

        1 reply 0 retweets 6 likes
        Show this thread
      5. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        Borgmaster, however, did not know about SRSly. Instead, all critical/production workloads were changed to run with the same priority so that they wouldn't preempt each other. Doing that for every Borg Job in the company was extremely painful -- more on priority/preemption later

        1 reply 0 retweets 6 likes
        Show this thread
      6. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        For Omega, we developed a model that could be applied both during task preemption to run a higher-priority task and eviction for maintenance -- disruption counters. There was a time dimension that ended up not being effective due to constant changes, so we dropped it in K8s

        1 reply 1 retweet 6 likes
        Show this thread
      7. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        I think I first mentioned this in Kubernetes in my big scheduling braindump comment: https://github.com/kubernetes/kubernetes/issues/4301#issuecomment-74355529 …. It came up again when I proposed maxUnavailable to moderate concurrent disruptions caused by updates during the design of Deployment:https://github.com/kubernetes/kubernetes/pull/12236#discussion_r36501373 …

        1 reply 0 retweets 5 likes
        Show this thread
      8. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        That discussion was forked into https://github.com/kubernetes/kubernetes/issues/12611 …. Around that time, Matt Liggett (https://github.com/kubernetes/kubernetes/pulls?q=is%3Apr+author%3Amml+is%3Aclosed …) joined the GKE team from Borg SRE (woo hoo!). One of the first things Matt worked on was improving node drains:https://github.com/kubernetes/kubernetes/issues/6080 …

        1 reply 0 retweets 7 likes
        Show this thread
      9. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        Together with @davidopp and @erictune4, we folded disruption budgets into the rescheduling design proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/rescheduling.md#disruption-budget …. (Rescheduling deserves its own thread -- I'll do that one next.) Implementation began in https://github.com/kubernetes/kubernetes/pull/24697 … andhttps://github.com/kubernetes/kubernetes/pull/25551 …

        1 reply 0 retweets 3 likes
        Show this thread
      10. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        PodDisruptionBudget is now documented: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ … and https://kubernetes.io/docs/tasks/run-application/configure-pdb/ …. Try it out and give us feedback on how well it works for you. We're looking to advance it from beta to GA:https://github.com/kubernetes/enhancements/issues/85 …

        3 replies 2 retweets 11 likes
        Show this thread
      11. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        You can safely drain a node with kubetctl drain: https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/ …. Node upgrades and the cluster autoscaler in Google Kubernetes Engine (GKE) also respect PodDisruptionBudget. The latter is documented here:https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler …

        1 reply 1 retweet 5 likes
        Show this thread
      12. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        Node upgrade behavior is documented here:https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades …

        1 reply 1 retweet 4 likes
        Show this thread
      13. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        And more about the Google SRE philosophy behind automation can be found in the SRE book: https://landing.google.com/sre/sre-book/chapters/automation-at-google/ …

        1 reply 1 retweet 7 likes
        Show this thread
      14. Brian Grant‏ @bgrant0607 31 May 2019
        • Report Tweet
        • Report NetzDG Violation

        And the Safe Removal Service was also mentioned in Google's "VM Live Migration at Scale" paper in VEE 2018: https://dl.acm.org/citation.cfm?id=3186415 …

        2 replies 1 retweet 4 likes
        Show this thread
      15. End of conversation

    Loading seems to be taking a while.

    Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

      Promoted Tweet

      false

      • © 2020 Twitter
      • About
      • Help Center
      • Terms
      • Privacy policy
      • Imprint
      • Cookies
      • Ads info