Sad to see yet another case of transaction ID wraparound in causing very long outage status.coveralls.io
It's important to monitor the horizons both for XID and multixact ID (check if you do the latter), have reliable alerts, and fight long-running transactions.
Conversation
they're on RDS and that's directly from their report that they can't.
on PG13 vacuum goes on append-only tables naturally so wraparound explosions should not happen, in theory.
2
1
Postgres 14's failsafe mechanism would very likely have saved the day here, without any DBA intervention. VACUUM now stops caring about indexes and stops applying cost delays when table's relfrozenxid starts to look dangerously old (per vacuum_failsafe_age GUC).
1
4
15
I wonder how many installs will "explode" with vacuum right after upgrade.
1
I suspect there won't be that many, but who knows? In any case the failsafe log output is very likely useful information -- an early warning system. If it isn't useful (e.g., for an advanced user), then it's easy to avoid by increasing vacuum_failsafe_age.
2
It's really more like "early timebomb triggerer" exploding more often and hopefully less painfully than wraparound. Big installs on Amazon explode painfully as iops budgeting does not let you read/write a lot at once without grinding instance to a halt.
1
Right. It's a less painful, more correctable failure - but still a failure. There is always going to be something like your AWS iops example that nobody thought of (until we get rid of freezing). Idea here is to let the DBA fix the problem *before* it becomes ruinously expensive.


