How to lose data: 1. A problem process eats disk space 2. Your email alert threshold is at 10% free 3. Your paging (wake me up) threshold is at 5% free 4. The ext4 reserved blocks are the default 5%. Woke up to FS at 5.01% free, a pile of <10% alert emails, and lost data.
Also, the metadata cleanup seems to have triggered some races causing data not to be properly staged for moving to another FS by another cronjob. Basically this whole thing's a mess and I hate it, but the final running out of disk space unnoticed issue was my fault.
-
-
Basically this thing has bit me in the ass several times, and I already have several wake me up level alerts for stuff going wrong. It just found *another* way to fail undetected :(
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.