Conversation

I haven't yet started automatically posting it anywhere, but the countdown is begin
Image
78
10.8K
I wonder where the funniest place to post this would be. obviously a twitter account is inherently hilarious but I'm already dangerously close to being banned. mastodon, maybe?
18
845
Replying to
okay it's now updating every 5 minutes. thanks to some laziness in how I implemented it, if there's a problem with the computer (or it prompts for an update), it'll pop up a dialog box that'll appear over the lettuce. I know, it's lazy, but if you want real SRE, pay me.
8
495
and it's saving the pictures locally. so I can eventually at least make a fun time-lapse
2
365
although the disk will fill up eventually. hopefully someone will be around to periodically clear out those files, and they haven't been fired
6
501
Heh. My host emailed me saying "your bandwidth alerts are set at a sustained 5 megabytes a second. Over the last 3 hours, your host has been hitting a sustained 54.7 megabytes a second"
3
273
I fixed it. Apparently the machine rebooted? I'm not sure why. Again, if you want real SRE, pay me.
1
171
It's running on a machine I bought for 15$ at a now defunct junk shop and threw a spare hard drive into. It might be overheating, that garage isn't too cool and now there's an LED light near it running 24/7. Or maybe it's allergic to lettuce?
2
160
Anyway the weird thing is how it didn't just fail to upload, but instead uploaded corrupted jpegs, as you can see here:
Quote Tweet
Well…this one looks like the lettuce is planning on taking Twitter down itself. Should someone check on @Foone?
Show this thread
Image
5
147
I can write check to see if that's happened and send me an alert if so, but that sounds dangerously close to actual SRE work.
2
132
Clearly I should have two new computers without potential cooling issues, and separate redundant cameras, doing automatic failover in case one breaks or during updates. And this should be well tested so that we know it works.
1
126
The alert should also go to pagerduty, with a defined fallback on-call tech in case I'm asleep or out of contact at the time it fails over.
3
119
I should also maybe set up a round-robin DNS and mirror the webcam site in multiple locations in case the server or whole data center goes down.
4
105
There's also the issue of redundant power supplies (I need a generator, not just a UPS). And we should do failure tests to confirm the setup works. It'd be a huge shame to only find out during a disaster that the PC is on a UPS, but the light isn't.
1
106
We also need a backup internet connection. Even if the server and the webcam box are up and running, if xfinity goes out there's no path to upload.
1
100
And I should have a dev/test environment. A secondary lettuce and webcam pc, for developing and testing new updates before they hit prod. Otherwise a bug in those could lower the Lettuce Uptime
6
168
All of these things are the kind of thing that people like me are paid to know about and do and test and build and WHOOPS most of them at Twitter got fired.
1
206
Gonna check the Lettuce Logs when I get back home. Honestly if I'm on call I should have my work laptop with me at all times, even if I'm just going out to grab a coffee, but apparently I'm not getting paid enough for that level of service.
4
136
BTW this is the kind of thing I was talking about with "this site will melt" and why you pay SREs to be on call: You can have all the automation in the world for potential problems you know about. This disk fills up? A script will clean it. That machine crashes? Reboot it.
3
202
You have on-call SREs for all the problems you DON'T know about. The weird "the machine spontaneously rebooted (after the incident, do root cause analysis) but still kept uploading broken images so it didn't set off the 'there's no images in Xty minutes!' alert" problems.
1
182
The ones you didn't know could or would happen until after they did at 6am on a random Saturday and some sleepy person has to get out of bed and fix them.
1
120
And I'm going to write a script to detect when this happens and automatically fix it or at least alert someone (me) so it can be manually fixed, and the service will be more robust because of it...
1
120
But that won't help tomorrow when the problem is "someone left the garage door open and a rabbit got inside and ate the lettuce"
12
208