SLAs already do this. If you're in danger of missing an SLA ... more work on stability and resilience is evidently needed. Typically that means fewer features .. but naturally. I think having to gate features is unhealthy and a sign of organizational disfunction.
-
-
Replying to @colmmacc @copyconstruct
I think that "strong contracts empower distributed ownership". Thus, an explicit error budget -- with e.g. common visibility of current value -- empowers teams to make the right decisions and have clear expectations what happens when you miss your SLA.
2 replies 0 retweets 2 likes -
Replying to @jhscott @copyconstruct
Measuring SLAs and how you're doing, down to each customer's experience, and having transparency about that ... that's all good. But that's just using an SLA IMO. Error budgets are an extra concept layered on top, and my point is I don't think it's a good one.
4 replies 0 retweets 1 like -
Even the name implies that there's a budget for errors. It's a bad framing. Leads to behaviors like being less cautious when the budget is less spent. Or worst ... that you can plan to spend it pro-actively to help cut some corners.
1 reply 0 retweets 1 like -
Replying to @colmmacc @copyconstruct
ah but I think that's one of the best parts! Like there is always a balance between velocity and reliability, right? So setting an *appropriate* SLA and error budget makes it clear to e.g. product managers that you are moving slowly b/c you need five nines.
1 reply 0 retweets 0 likes -
vs moving faster if you can get away with three nines. If your business use case can't take the corner cutting, tighten your SLA!
1 reply 0 retweets 0 likes -
Replying to @jhscott @copyconstruct
*shudder*. That is the exact unhealthy organizational dysfunction I'm talking about! First - we don't need to trade-off velocity and reliability. Investments in test and deployment automation, and in compartmentalization, usually deliver both. Second ...
1 reply 0 retweets 4 likes -
if you're in a PHB product managers are in charge but not listening situation ... fix that :) Best model is comprehensive team ownership of business, development, and operations, with mutual understanding.
2 replies 0 retweets 2 likes -
Replying to @colmmacc @copyconstruct
I don't disagree and/or I "want to believe", but the same way
@copyconstruct alludes to error budgets being rarely deployed in practice, I believe that especially for hypergrowth companies, "investments in test and deployment automation" don't yield...1 reply 0 retweets 0 likes -
enough of "both" to offset rapid growth in engineering headcount, features, service<>service interactions, etc. See e.g.https://medium.com/@mattklein123/the-human-scalability-of-devops-e36c37d3db6a …
1 reply 1 retweet 0 likes
We're getting so much better as an industry! I remember going through a 3 day global outage at Skype, 13 years ago now. Can't imagine the same today if we're building on modern tooling, cloud, etc. Probably a 10x improvement in that time.
-
-
I have hope that serverless can deliver a similar improvement, over a similar time-frame. Hopefully in 2030, 4-5 nines is very low effort even for a boutique one or two person team full of naivety.
1 reply 0 retweets 0 likes -
Replying to @colmmacc @copyconstruct
ah, sigh, I'm kind of a serverless skeptic. I want to believe! But not clear how it adds nines (point me to resources if you have them!). For your potential two hour outages, maybe ask *why* developers want them and why the automated tooling hasn't de-risked their changes?
1 reply 0 retweets 0 likes - 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.