Conversation

Pushshift is recompressing all Reddit monthly dumps into a common compression scheme (zst). This should be completed within a few days. There were some files compressed as xz, bz2, etc. So it was a bit annoying. Going forward I will use zstandard for all compression.
3
23
Replying to
That's great, zstd is so much faster to decompress! Bit late, but would you be open to storing N concatenated zst files, where each is a single subreddit's entries? People could continue using it as one big file, or could build an index to fetch just the reddits they care about
1
Show replies