Conversation

Decompressing and processing 100k tweets a second challenge. First stage is figuring out the main bottleneck. time zstd -qcd tweets.zst > /dev/null Using Zstandard on one file, decompressing 11.47GB (16,869,703 tweets) in 112.68 seconds. (~ 101.7 MB/s or 149,713 tweets/s).
1
9
Thing to figure out: 1) Why is zstandard decompressing so slowly for this particular file (does long distance matching slow down decompression?) 2) Why the large increase in time for Python sys.stdin? 3) Benchmark Go. Stay tuned.
1
5