As of today, we have about eighteen years to go until the Y2038 problem occurs.
But the Y2038 problem will be giving us headaches long, long before 2038 arrives.
I'd like to tell you a story about this.
-
-
The program was not that big, maybe a few hundred lines. But it was fairly impenetrable — written in a style that favored computational efficiency over human readability. And of course, there were zero tests.
Prikaži ovu nit -
As luck would have it, a change in the orchestration of the scripts that ran in this environment had been pushed the day before. This was believed to be the culprit. Engineering rolled things back to the previous release. Unfortunately, this made the problem worse.
Prikaži ovu nit -
You see, the program's purpose was to compute certain contribution rates for certain kinds of pension funds. It did this by writing out a big CSV file. The results of this CSV file were inputs to other programs. Those ran at various times each day.
Prikaži ovu nit -
Another program, the benefits distributor, was supposed to alert people when contributions weren't enough for projections. It hadn't run yet when the initial problem occurred. But it did now.
Prikaži ovu nit -
Noticing that there was no output from the first program since it had crashed, it treated this case as "all contributions are 0". This, of course, was not what it should do. But no one knew it behaved this way since, again, the first program had never crashed.
Prikaži ovu nit -
This immediately caused a massive cascade of alert emails to the internal pension fund managers. They promptly started flipping out, because one reason contributions might show up as insufficient is if projections think the economy is about to tank.
Prikaži ovu nit -
The firm had recently moved to the cloud and I had been retained to architect the transition and make the migration go smoothly. They'd completed the work months before. I got an unexpected text from the CIO:pic.twitter.com/OyHX4nk5nH
Prikaži ovu nit -
S1X is their word for "worse than severity 1 because it's cascading *other* unrelated parts of the business". There had only been one other S1X in twelve months.
Prikaži ovu nit -
I got onsite late that night. We eventually diagnosed the issue by firing up an environment and isolating the script so that only it was running. The problem immediately became more obvious; there was a helpful error message that pointed to the problematic part.
Prikaži ovu nit -
We were able to resolve the issue by hotpatching the script. But by then, substantive damage had already been done because contributions hadn't been processed that day. It cost about $1.7M to manually catch up over the next two weeks.
Prikaži ovu nit -
The moral of the story is that Y2038 isn't "coming". It's *already here*. Fix your stuff.
Prikaži ovu nit -
Postscript: there's lots more that I think would be interesting to say on this matter that won't fit in a tweet. If you're looking for speakers at your next conference on this topic, I'd be glad to expound further. I don't want to be cleaning up more Y2038 messes!
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.
: