Skip to content
By using Twitter’s services you agree to our Cookies Use. We and our partners operate globally and use cookies, including for analytics, personalisation, and ads.

This is the legacy version of twitter.com. We will be shutting it down on June 1, 2020. Please switch to a supported browser, or disable the extension which masks your browser. You can see a list of supported browsers in our Help Center.

  • Home Home Home, current page.
  • About

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @
  • Language: English
    • Bahasa Indonesia
    • Bahasa Melayu
    • Català
    • Čeština
    • Dansk
    • Deutsch
    • English UK
    • Español
    • Filipino
    • Français
    • Hrvatski
    • Italiano
    • Magyar
    • Nederlands
    • Norsk
    • Polski
    • Português
    • Română
    • Slovenčina
    • Suomi
    • Svenska
    • Tiếng Việt
    • Türkçe
    • Ελληνικά
    • Български език
    • Русский
    • Српски
    • Українська мова
    • עִבְרִית
    • العربية
    • فارسی
    • मराठी
    • हिन्दी
    • বাংলা
    • ગુજરાતી
    • தமிழ்
    • ಕನ್ನಡ
    • ภาษาไทย
    • 한국어
    • 日本語
    • 简体中文
    • 繁體中文
  • Have an account? Log in
    Have an account?
    · Forgot password?

    New to Twitter?
    Sign up
colmmacc's profile
Colm MacCárthaigh
Colm MacCárthaigh
Colm MacCárthaigh
@colmmacc

Tweets

Colm MacCárthaigh

@colmmacc

AWS, Apache, Crypto, Irish Music, Haiku, Photography

Seattle
notesfromthesound.com
Joined April 2008

Tweets

  • © 2020 Twitter
  • About
  • Help Center
  • Terms
  • Privacy policy
  • Imprint
  • Cookies
  • Ads info
Dismiss
Previous
Next

Go to a person's profile

Saved searches

  • Remove
  • In this conversation
    Verified accountProtected Tweets @
Suggested users
  • Verified accountProtected Tweets @
  • Verified accountProtected Tweets @

Promote this Tweet

Block

  • Tweet with a location

    You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more

    Your lists

    Create a new list


    Under 100 characters, optional

    Privacy

    Copy link to Tweet

    Embed this Tweet

    Embed this Video

    Add this Tweet to your website by copying the code below. Learn more

    Add this video to your website by copying the code below. Learn more

    Hmm, there was a problem reaching the server.

    By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.

    Preview

    Why you're seeing this ad

    Log in to Twitter

    · Forgot password?
    Don't have an account? Sign up »

    Sign up for Twitter

    Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

    Sign up
    Have an account? Log in »

    Two-way (sending and receiving) short codes:

    Country Code For customers of
    United States 40404 (any)
    Canada 21212 (any)
    United Kingdom 86444 Vodafone, Orange, 3, O2
    Brazil 40404 Nextel, TIM
    Haiti 40404 Digicel, Voila
    Ireland 51210 Vodafone, O2
    India 53000 Bharti Airtel, Videocon, Reliance
    Indonesia 89887 AXIS, 3, Telkomsel, Indosat, XL Axiata
    Italy 4880804 Wind
    3424486444 Vodafone
    » See SMS short codes for other countries

    Confirmation

     

    Welcome home!

    This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.

    Tweets not working for you?

    Hover over the profile pic and click the Following button to unfollow any account.

    Say a lot with a little

    When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.

    Spread the word

    The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.

    Join the conversation

    Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.

    Learn the latest

    Get instant insight into what people are talking about now.

    Get more of what you love

    Follow more accounts to get instant updates about topics you care about.

    Find what's happening

    See the latest conversations about any topic instantly.

    Never miss a Moment

    Catch up instantly on the best stories happening as they unfold.

    1. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      It's my ten year anniversary at AWS, I got a new badge and everything! To celebrate, I'm going to tweet out the lightning talk I gave at last week's Amazon dev con. It's all about my favorite thing from my ten years: Shuffle Sharding!pic.twitter.com/FkJ4ykBWGt

      14 replies 182 retweets 554 likes
      Show this thread
    2. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      Ever wonder how we manage multi-tenancy at AWS? or why we want you to use the personal health dashboard instead of the AWS status dashboard? are you pining for a bonus content section with probabilistic math? These slides on Shuffle Sharding are for you!!

      1 reply 8 retweets 25 likes
      Show this thread
    3. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      O.k., so this is me, 15 years ago, building a data center. That's what I used to do for money. This one was about 30 racks, and I was the project lead. It took me about a year to build it, everything from initial design to labeling cables.pic.twitter.com/0wCFkERNTs

      1 reply 2 retweets 17 likes
      Show this thread
    4. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      These days I work with the AWS ELB team, and we regularly spin up that same amount of capacity in minutes. Think software upgrades or new region builds. This is insane to me. What took me a year now takes me me minutes. That's Cloud.

      1 reply 4 retweets 29 likes
      Show this thread
    5. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      Our Cloud gives us this agility because we all pool our resources. So a much bigger, and better than me, team can build much bigger, and better than mine, data centers, which we all share. 10 years in, pretty much everyone understands that this is awesome.

      1 reply 0 retweets 11 likes
      Show this thread
    6. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      BUT ... it comes with a challenge. When I built my own data centers, I was the only tenant and didn't need to worry about problems due to other customers. The core competency of a cloud provider is handling this challenge: efficiency and agility but still a dedicated experience.

      1 reply 1 retweet 17 likes
      Show this thread
      Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
      • Report Tweet
      • Report NetzDG Violation

      It's no good sharing everything if a single "noisy neighbor" can cause everyone to have a bad experience. We want the opposite! At AWS we are super into compartmentalization and isolation, and mature remediation procedures. Shuffle Sharding is one of our best techniques. O.k. ..

      10:36 AM - 28 Aug 2018
      • 1 Retweet
      • 14 Likes
      • Karl Cuppazucchini Ankit Timbadia Bernard Vander Beken atechiethought tinyrobots Daniel Bechaz Shan Kelly Donald Un-Jong Garrett Wilkin
      1 reply 1 retweet 14 likes
        1. New conversation
        2. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          So to understand Shuffle Sharding, let's start with a traditional service. Here I have 8 instances, web servers or whatever. It's horizontally scalable, and fault tolerant in the sense that I have more than one instance.pic.twitter.com/TkCgkIQa5E

          1 reply 1 retweet 7 likes
          Show this thread
        3. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          We put the servers behind a load balancer, and each server gets a portion of the incoming requests. LOTS of services are built like this. Super common pattern. You're probably yawning!pic.twitter.com/6KPDf5OXQg

          1 reply 0 retweets 8 likes
          Show this thread
        4. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          And then one day we get a problem request. Maybe the request is very expensive, or maybe it even crashes a server, or maybe it's a run-away client that is retrying super aggressively. AND it takes out a server. This is bad.pic.twitter.com/cQ7sspawwu

          1 reply 1 retweet 6 likes
          Show this thread
        5. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          What's even WORSE is that this can cascade. The load balancer takes the server out, and so the problem request goes to another server, and so on. Pretty soon, the whole service is down. OUCH. Everyone has a bad day, due to one problem request or customer.pic.twitter.com/pEyjWCInGC

          1 reply 0 retweets 10 likes
          Show this thread
        6. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          At AWS we have a term the scope of impact that we call "Blast Radius". And here the blast radius is basically "All customers". That's about as bad as it gets. But this is really common! Lot's of systems are built like this out in the world.pic.twitter.com/vn6xD2QyeB

          1 reply 0 retweets 9 likes
          Show this thread
        7. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          O.k., so what can we do? Well traditional sharding is something that we can do! We can divide the capacity into shards or cells. Here we go with four shards of size two. This change makes things much better!pic.twitter.com/9Oa3T43SNW

          1 reply 0 retweets 9 likes
          Show this thread
        8. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Now if we repeat the same event, the impact is much smaller, just 25% of what it is was. Or more formally ...pic.twitter.com/8bcvECcNnu

          1 reply 0 retweets 8 likes
          Show this thread
        9. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          The blast radius is now the number of customers divided by the number of shards. It's a big improvement. Do this! At AWS we call this cellularization and many of our systems are internally cellular. Our isolated regions, and availability Zones are a big famous macro example too.pic.twitter.com/PmolDBpl7W

          1 reply 6 retweets 21 likes
          Show this thread
        10. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          THIS is one reason why the personal health dashboard is so much better than the status page. The status for your cell might be very different than someone else's!

          2 replies 2 retweets 21 likes
          Show this thread
        11. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          ... anyway, back to Shuffle Sharding. With Shuffle Sharding we can do much better again than traditional sharding. It's deceptively simple. All we do is that for each customer we assign them to two servers pretty much at random.pic.twitter.com/0SISi2nsRL

          1 reply 2 retweets 11 likes
          Show this thread
        12. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          So for example, the ❤️ customer is assigned to nodes 1 and 4. Suppose we get a problem request from ❤️ ? What happens?pic.twitter.com/5FZyFXIbJ4

          1 reply 0 retweets 7 likes
          Show this thread
        13. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Well ... it could take out nodes 1 and 4. So ❤️ is having a bad experience now. Amazing devops teams are on it, etc , but it's still not great for them. Not much we can do about that. But what about everyone else?pic.twitter.com/1J6oorA510

          1 reply 0 retweets 7 likes
          Show this thread
        14. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Well, if we look at ❤️'s neighbors. They're still fine! As long as their client is fault tolerant, which can be as simple as using retries, they can still get service. 😀 gets service from node 2 for example.pic.twitter.com/xJRpdK6Fgn

          1 reply 0 retweets 11 likes
          Show this thread
        15. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          O.k. let's PAUSE for a second and appreciate that. Same number of nodes. Same number of nodes for each customer. Same number of customers. Just by using MATH, we've reduced the blast radius to 1 customer! That's INSANE.

          2 replies 1 retweet 16 likes
          Show this thread
        16. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          The blast radius ends up getting really small. It's roughly proportionate to the factorial of the shard size (small) divided by the factorial of the number of nodes (which is big) ... so it can get really really small.pic.twitter.com/LmaffLA3tR

          2 replies 0 retweets 10 likes
          Show this thread
        17. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Let's look at example. So for 8 nodes and a shard size of 2, like these slides fit, the blast radius ends up being just 3.6%. What that means is that if one customer triggers an issue, only 3.6% of other customers will be impacted. Much better than the 25% we saw earlier.pic.twitter.com/BuDe9pCMEa

          1 reply 1 retweet 9 likes
          Show this thread
        18. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          But that's still way to high for us. AWS HyperPlane, the system that powers VPC NAT Gateway, Network Load Balancer, PrivateLink, etc ... we design for a hundred nodes, and a shard size of five. Let's look at those numbers ...

          1 reply 1 retweet 13 likes
          Show this thread
        19. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          O.k. now things get really really small. About 0.0000013% of other customers would share fate in this case. It's so small that because we have fewer than a million customers per cell anyway, there can be zero full overlap.pic.twitter.com/xmZHjKMIy0

          1 reply 1 retweet 11 likes
          Show this thread
        20. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Again think about, we can build a huge big multi-tenant system with lots of customers on it, and still guarantee that there is *no* full overlap between those customers. Just using math. This still blows my mind.

          1 reply 0 retweets 10 likes
          Show this thread
        21. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          If you want to try some numbers out for yourself, here's a python script that calculates the blast radius: https://gist.github.com/colmmacc/4a39a6416d2a58b6c70bc73027bea4dc … . Try it for Route 53's numbers. There are 2048 Route 53 virtual name servers, and each hosted zone is assigned to 4. So n = 2048, and m = 4.

          1 reply 2 retweets 18 likes
          Show this thread
        22. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          If you want to make your own Shuffle Shard patterns, and make guarantees about non-overlap, we open sourced our approach years ago. It's at:https://github.com/awslabs/route53-infima …

          1 reply 1 retweet 28 likes
          Show this thread
        23. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Shuffle Sharding is amazing! It's just an application of combinatorials, but it decreases blast radiuses by huge factorial factors. So what does it take to use it in practice?

          1 reply 0 retweets 4 likes
          Show this thread
        24. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Well the client has to be fault-tolerant. That's easy, nearly all are. The technique works for servers, queues, and even things like storage. So that's easy too. The big gotcha is that you need a routing mechanism.pic.twitter.com/1berEp8FO9

          1 reply 0 retweets 11 likes
          Show this thread
        25. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          You either give each customer resource a DNS name, like we do for S3, CloudFront, Route53, and handle it at the DNS layer, or you need a content-aware router than can do ShuffleSharding. Of course at our scale, this makes sense, but not everyone.

          3 replies 0 retweets 9 likes
          Show this thread
        26. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          O.k,. bonus MATH content!! I want to convince you that that rough approximation from earlier is correct, because with more insight we can make smarter decisions.pic.twitter.com/rsgJrYQu63

          1 reply 0 retweets 2 likes
          Show this thread
        27. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Shuffle Sharding is just like a lottery. Think about your nodes like the numbers in a lottery, and each customer gets a ticket with |shardsize| count of numbers. You want to measure the probability that two or tickets match.

          1 reply 0 retweets 4 likes
          Show this thread
        28. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          First, we have to define some shorthand. N is the number of nodes. S is the shard size. O is the potential overlap between two tickets/customers. https://en.wikipedia.org/wiki/Lottery_mathematics … has good background on how we then come to this equation ...pic.twitter.com/RMPYMIBtx4

          1 reply 0 retweets 2 likes
          Show this thread
        29. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Now, let's take the special case of "full overlap". That's the case we care the most about; the problem request eats all of the nodes it can reach. How many other customers are impacted? Since O=S in this case, we end up with ...pic.twitter.com/hIb2VR7RPq

          1 reply 0 retweets 1 like
          Show this thread
        30. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          The bracket notation is short for "choose" and since x choose x is 1, and x choose 0 is 1, we can replace everything above the line with 1.pic.twitter.com/X6evtSZMsv

          1 reply 0 retweets 2 likes
          Show this thread
        31. Colm MacCárthaigh‏ @colmmacc 28 Aug 2018
          • Report Tweet
          • Report NetzDG Violation

          Now let's expand the choose operator into its factorials ...pic.twitter.com/f0L2ZBGZYw

          1 reply 0 retweets 1 like
          Show this thread
        32. 3 more replies

      Loading seems to be taking a while.

      Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

        Promoted Tweet

        false

        • © 2020 Twitter
        • About
        • Help Center
        • Terms
        • Privacy policy
        • Imprint
        • Cookies
        • Ads info