Conversation

Replying to
time zstd -qcd FH_2020-05-28_20.ndjson.zst | wc -l Piping into wc, we get a time of: real2m52.810s user2m6.414s sys1m46.441s (97,619 tweets per second). Probably can't get this much speed doing any real processing with the tweets so might have to go for 75k a second.
1
2
Python script to just read from sys.stdin and do nothing with the data shows a lot of slow down: for line in sys.stdin: pass real6m20.229s user8m2.420s sys1m51.007s (44,367 tweets a second)
1
2
Thing to figure out: 1) Why is zstandard decompressing so slowly for this particular file (does long distance matching slow down decompression?) 2) Why the large increase in time for Python sys.stdin? 3) Benchmark Go. Stay tuned.
1
5