it has nice graphics
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I just assumed someone forgot to change the Octocat's litter box, it got pissed, and went on a rampage yanking cables with all 8 arms....
-
(and I am mildly amused by the fact that the failure cascade basically started with a network disconnect ala octocat-cable-yanking.)
End of conversation
New conversation -
-
-
enjoyed reading this, now if only there was one for YouTube going down
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I'm particularly impressed by how it only took 20 minutes from the start of the incident to diagnose and agree on a path forward (which I agree with FWIW - prioritize user data). The cause of the issue was also p much what one would've guessed. Thankful I've never deal w/it.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Wow, that was a good read. I love when teams are allowed to tell the world about what happened and how they worked through to recovery.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I've been doing dev & sysadmin stuff for 20 years now and I think I'd have made each and every one of those same decisions given what was known at the time. I hope your engineers aren't blaming themselves!
-
I think it's tragic/hilarious that I've had similar things happen before with vanilla multi-master replication, right down to crawling through binary replication logs to see which queries need to be re-run. That's scary for one server, can't imagine it at Github's scale!
End of conversation
New conversation -
-
-
thanks for this, it was a great read.
#hugopsThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.