I've created a Python module that makes reading the Reddit monthly file dumps easier for researchers. This will open .zst files and read them line by line without loading the entire file into memory.
github.com/pushshift/zrea
#bigdata #datasets
Conversation
Replying to
It does -- but some people want to have the code encapsulated entirely in Python so this should help with that.
Yeah, it is better than to make your code read a pipe
1
1
I guess it’s personal preference. I prefer stdio, it’s simple and flexible.
1


