First test with 96 core server using zstd to compress 25,881,600 tweets.
time zstd -qcd tweets.zst | zstd -v -19 --long=31 -T96 > test.zst
5.07% (238960762511 => 12123745253 bytes)
real 34m31.667s
user 1599m22.198s
Conversation
For some reason, zstd wasn't using all possible threads. I'm not sure if this was an I/O situation or due to CPU cache issues. Need to do more testing. It averaged around ~53 load average for the entire compression.
2
3
It was able to reduce 239 GB of data down to 12.1 GB in under 35 minutes.
1
11
Replying to
The speed is about ~7 GB / mn, which translates into > 100 MB/s. Some I/O setup could be limited by this max speed (close to1 Gb ethernet).
Alternatively, the `--long` mode is single-threaded and serialized, and while reasonably fast, could end up becoming the speed bottleneck.
1
1
Replying to
When you say single-threaded and serialized, do you mean that the main thread is basically farming out work units in a serialized manner so that it would be virtually impossible to keep all threads fed for the duration of the compress job (at least, that's what I am seeing).
(PS: It's super impressive -- thanks for all your hard work on this. It's my go to default compression scheme now).
1

