I still remember one of the first: one of our early services had a subtle memory leak. The team did a great job and build a system to take machines out of service gracefully every few hours, restart the service, and bring them back. No customer impact.
-
-
Show this thread
-
Maybe that sounds simple and obvious now, but it wasn't to me at the time: I'd have focused on finding the source of the memory leak and fixing it, but that's not the first priority.
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.