Avoid emergency modes that are different, or anything that can alter what the system is doing suddenly. Think about your system in terms of state space, or code branches. How many can you get rid of?
-
Show this thread
-
Branches and state spaces are evil, because they grow exponentially, past the point you can test or predict behaviour, they become emergent instead. A simple example here is relational databases.pic.twitter.com/jiHIm1zGNk
1 reply 1 retweet 10 likesShow this thread -
I'm not knocking offerings like RDS or Aurora, relational DBs are great for versatile business queries, but they are terrible for control planes. We essentially ban them for that purpose at AWS. Why?
1 reply 0 retweets 9 likesShow this thread -
RDBMSs have built-in fancy Query Plan Optimizers that can suddenly change what indices are being used, or how tables are being scanned. That can have a disastrous effect on performance or behaviour. Another is that they are very accessible and tempting ...
1 reply 0 retweets 5 likesShow this thread -
... an operator, product manager, business analyst might all think it's safe to run a one-time read-only query, but a simple SQL typo can choke up the system! Bad bad. So what's the fix?
1 reply 1 retweet 5 likesShow this thread -
Use NoSQL and do things the "dumb" way every time. Because the perf characteristics are much more obvious to the programmer and designer, now you can just do a full join, or a full table scan every time for every query. Much more stable!
1 reply 1 retweet 11 likesShow this thread -
I've tweet stormed about this before, but now we're getting into the "constant work" pattern. The most stable control systems do the same work all of the time, with no change that is dependent on the data, or even the volume of change.pic.twitter.com/Gp0eD5emZi
2 replies 3 retweets 15 likesShow this thread -
Suppose you need to get some config to your data plane. What if the data plane just fetched the config from S3 every 10 seconds, whether it changed or not? And reloaded the configuration, every time, whether it changed or not?
2 replies 2 retweets 8 likesShow this thread -
This simple, simple, design is rarely seen in the wild, but I don't know why. It's very very reliable ... incredibly resilient and will recover from all sorts of issues. It's not even expensive! We're talking hundreds of dollars per year. Not even a few days of SDE time.pic.twitter.com/6ZBaxiamwP
2 replies 0 retweets 9 likesShow this thread -
That's the pattern we use for our most critical systems. The network health check statuses that allow AWS to instantly handle an Availability Zone power issue? Those are always flowing, all the time, 0 or 1, whether they change or not.
1 reply 0 retweets 9 likesShow this thread
We have these and so many more patterns, and ... we're been building them into API Gateway and Lambda behind the scenes too! So consider building your control planes on those!pic.twitter.com/DgzdZAyNNC
-
-
Thank your for listening to my talk! Always always feel free to AMA. This is the last tweet in the thread for now, and I won't even promote my Soundcloud!
2 replies 0 retweets 24 likesShow this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.