PostgreSQL 13 B-Trees will be able to use deduplication, which I committed today. It often make indexes far smaller. Also helps VACUUM with controlling index bloat (even in unique indexes). I wrote some documentations on the internals, for advanced users: postgresql.org/docs/devel/btr
Conversation
Replying to
This is nice. I know of it via the name RID-list.
* step 1 - support this for index entries
* step 2- add lightweight compression for RID-lists
* step 3 - support efficient and/or/count on compressed RID-list during query execution
1
1
3
Replying to
The overhead is low enough for the deduplication feature to work well with OLTP or mixed workloads. It will probably remain enabled by default in the next stable Postgres release. Lazy approach has significant upsides.
1
4
Replying to
I agree with you. But I expect this to have a big impact on the analytics side if work is done to take advantage of the RID-lists.
1
Replying to
Postgres' GIN indexes already use TID compression, but that's for JSON/FTS indexing. For standard B-Trees, I like an approach that gracefully adapts from tuples to RID lists, and finally to bitmap index style pages. ISTM that only large RID lists are worth compressing anyway.
1
1
So bitmap index style pages for low cardinality values are the missing piece at this point. I agree that the representation should be convenient to query processing itself. This paper seems like it might have some insight on how to approach the next phase: citeseerx.ist.psu.edu/viewdoc/downlo
Replying to
Another paper I need to read. I worked on bitmap indexes in Oracle for a few years, but that ended in 2005. I might need to get current.

