As a company evolves, different and new challenges arise. And people are called to face them, even if they never had to before. I have recently been wearing the operator hat for part of our systems, both as maintainer and member of the oncall team. [1/]
-
Show this thread
-
The system itself is probably nowhere near the complexity of what
@joyent is running, but I can relate to what@bcantrill says in this talk, from a couple of years ago: https://youtu.be/30jNsCVLpAE [/2]1 reply 0 retweets 0 likesShow this thread -
Automation rules out a whole class of possible errors, but it also often reduces the number of people who have sufficiently holistic understanding of the whole system. [/3]
1 reply 0 retweets 0 likesShow this thread -
And when an incident strikes, because of all those safeguards, it's usually much more difficult to pin down. The root cause, if any, is subtler. Hidden behind layers of stuff that "used to just work". [/4]
1 reply 0 retweets 0 likesShow this thread -
And all the monitoring, the logs, the alerts... A deluge of data, useless without a hypothesis to glue the pieces together. Cultivating a structured approach to investigation is something I never taught to explicitly, but it makes so much sense! [/5]
2 replies 1 retweet 1 likeShow this thread -
And it's probably one of the reasons I find some of these challenges to be so interesting. Because you get the complexity of real natural systems, distilled in an environment where you control the knobs, the variables, the setting. [/6]
1 reply 0 retweets 1 likeShow this thread
It speaks of god complex, but it also so much fun. [/7]
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
Writing about stuff to learn how it works, mostly in Rust.
Lead Engineer at