http://HotCRP.com had an outage this morning. I’m very sorry about that! Here is what happened.
-
Show this thread
-
This line caused the outage 7 days after deployment: https://github.com/kohler/hotcrp/commit/6b61620f2c1df11434bab4139282257ff5742040#diff-e2508263ec2a4b7dc9f6f0c9e1f0bfb4R1 … This commit solves this (class of) outage:https://github.com/kohler/hotcrp/commit/14b0895bc64ff4a74b93e906148ac1be8c3c7bce …
1 reply 0 retweets 2 likesShow this thread -
HotCRP uses a Perl script called `banal`, originally by Geoff Voelker, to analyze PDFs for compliance. I rewrote this script about a week ago because it had bugs, especially around leading detection, and lacked features such as reference/appendix detection.
1 reply 0 retweets 4 likesShow this thread -
Among the “good” software engineering practices I added: `use warnings`, aka `perl -w`, which causes Perl to emit warnings on dangerous old-school operations like comparing 0 with undefined. I removed a ton of warnings, but missed some.
1 reply 0 retweets 1 likeShow this thread -
The PHP interpreter has an amazing feature: if a page takes too much time to render, the interpreter exits cleanly. This eliminates whole classes of failure. But some functions, such as reading a file’s contents, don’t count against the timeout.
1 reply 0 retweets 2 likesShow this thread -
We are ready for the outage. 1. There is a funky PDF at e-Energy '19. Analyzing this PDF caused banal to generate lots of warnings. 2. The Perl interpreter wrote these warnings to a pipe back to the PHP interpreter. The Perl interpreter then blocked because the pipe filled up.
2 replies 0 retweets 2 likesShow this thread -
3. Meanwhile, the PHP interpreter blocked reading from a *different* pipe: Perl’s *output*, rather than its *error output*. Deadlock.
2 replies 1 retweet 8 likesShow this thread -
Solution: read from the error output first, because banal generates its actual output last. Other workarounds: don’t generate warnings; add a timeout to the Perl interpreter; use different PHP functions that don’t skip the timeout check (?); …
1 reply 0 retweets 4 likesShow this thread -
A better abstraction would be for PHP to never skip the timeout check! Or for it to express reading asynchronous-style. As a systems academic it always amazes me that anything works at all. Again, apologies.
9 replies 2 retweets 40 likesShow this thread
As a systems builder it always amazes me that anything works at all
Great story - thanks for publishing it!
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
Twitter at the speed of parenting