A more detailed description of last weekend’s massive GCP outage is on the blog - there’s so much I winced at I can’t even highlight specific interesting lines. https://status.cloud.google.com/incident/cloud-networking/19009 … Configuration change. Automation making things worse. And the postmortem isn’t even fully done
-
-
Why does automation even have access to multiple regions? Why add more logic, which could have its own bugs and problems, instead of enforcing more robust separation; compartmentalization with their own local fail-safes. 3/n
-
it reads like the deep reaction is "We just need to be a bit smarter, just add another case for this, we can defeat reality", rather than humility in the face of the unpredictable, and embracing simplicity. 4/n
- 2 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.