Pushshift is completely rewriting the Reddit ingest script to handle parallel workers. What does this mean for you? No more large delays in updating comment and submission data!
I'm hoping to have this completed in the next few days.
Conversation
Replying to
I dont know if now is the time, but it would be super useful to understand more about how the pipeline works if you have a chance to write about that. I think it's especially important for research based on PushShift's data. PS. I'd be happy to help if you want any.
1
2
Replying to
Good point. I'll probably open-source the new ingest methodology so I can get input from other researchers and make sure the methods used are sound and efficient.
Replying to
That would be great! The most complete description of a system of software is its source code. One side effect of opening up source code is that you often need to document what the various pieces are. If open sourcing the code isn't feasible, that description is still useful.

