The default settings for autovacuum_*_scale_factor and autovacuum_*_threshold aren't great for large tables. I like to set them to 0 and 1m respectively for tables over 20m rows or so. #PostgreSQL
Conversation
Replying to
I don't think that's a good idea. The cost of vacuuming continues to increase on larger tables due to index vacuums, and the gain of vacuuming 1 million dead rows on a 10billion row table isn't meaningful.
3
You’re unable to view this Tweet because this account owner limits who can view their Tweets. Learn more
I think they're ok enough defaults given the current autovacuum logic, but it heavily depends on the workload.
Instead of different defaults, we need more types of autovacuum triggers.
2
E.g. a more aggressive heap vacuum trigger to set VM bits, separate from the triggers for index vacuums.
But for that we should make heap vacuums more efficient, to prevent regressions.
Some of and Sawada-san's recent work goes some steps toward that.
2
2
IMV the right approach is to decouple *everything* in VACUUM and let it figure out what matters by noticing per-index/table problems as they happen. Bloat is both harmful and benign (often both). Top-down scheduling seems too complex due to non-linear behaviors. Bottom-up works.
2
3
Generally agreed.
Having some pressure for doing such work close to each other also has benefits though - a considerably higher likelihood of cache hits and re-modifying pages that are already dirty before they're written out. And a lower likelihood of FPIs.
2
1
And I'm not sure I believe that it's a good idea to schedule such things in a "local" manner. E.g. when scheduling vacuums due to getting closer to the xid horizon, it makes sense to first schedule several heap vacuums in "age" order, then index vacuums separately.
4
Fwiw I've multiple times seen pathological systems where some minor index corruption cause vacuum to crash. So every minute it tried to vacuum the same table and failed. So autovacuum stopped making any progress at all.
1
1
Error from index ought to not prevent VACUUM from completing remaining work (excluding second heap pass). Postgres 14 failsafe mechanism should at least prevent a wraparound outage, though. Index vacuuming optional in principle now, so a comprehensive fix is quite possible.
I'm more concerned about a busy database not getting any vacuums for months. I'm thinking autovacuum should randomize the schedule a bit.
1
1
A related idea that I considered recently is to freeze some blocks earlier and at random, to spread out the cost of freezing blocks that are already ready to be frozen. The generic xid based cutoff isn't particularly natural.
1
2
Show replies
Consider also systems where some clever sod has a regular job that takes a lock on the same table that interferes with vacuum. It'll keep canceling itself every time and restarting the next chance it can.



