regarding bloat and fragmentation, if it's ensured that the long-running tx can access to only the table A, we can safely vacuum table B even while that tx is running? #pgconfasia
-
-
Replying to @fujii_masao
In theory, yes -- but we have no mechanism to prevent the long-running tx from accessing new tables. For example, suppose a PL/pgsql function tricky(int) is called. When $1 < 10000000, it accesses table A; else table B. So a new table could be accessed quite late.
1 reply 0 retweets 1 like -
Replying to @robertmhaas
yes, and I was thinking to prepare such a mechanism when I was involved in the system having long-running batch transaction. that didn't happen because it was successfully divided to several tx, though.
1 reply 0 retweets 0 likes -
Replying to @fujii_masao
I think it's quite a hard problem. I think you could have a PGPROC flag indicating that the transaction is prohibited from locking any new tables; if the flag is set, then query the lock manager to see which tables it already has locked. Might be expensive, though.
2 replies 0 retweets 0 likes -
Replying to @robertmhaas @fujii_masao
I think it'd be more promising to start vacuuming rows that are newer than the longrunning transaction, but already deleted. That'd not require the restriction of not accessing new tables in old transaction, which I don't see us exposing to SQL.
2 replies 0 retweets 0 likes -
Replying to @AndresFreundTec @fujii_masao
The trick there is the bookkeeping involved. You have to store snapshots (or something) in shared memory so you can identify what can be vacuumed, which requires a variable amount of storage, and makes testing each tuple a slower process.
1 reply 0 retweets 0 likes -
Replying to @robertmhaas @fujii_masao
It's not that much additional information. We already have xmin, if we additionally track xmax it'd be sufficient. Why isn't that sufficient / why do we need variable amounts of storage?
1 reply 0 retweets 0 likes
And sure, testing each tuple gets more expensive, but it'd be quite reasonable to add heuristics about how old a tuple has to be to use the more expensive methodology - in which case it'd be much more likely to be worth the resources.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.