Each day, Pushshift is now ingesting, indexing and storing over half a billion social media objects. The eventual goal is to be able to process up to one billion objects daily for researchers to use to better understand how social media platforms influence society.
Conversation
Replying to
The scale you're at is astounding. My system (using a small portion of pushift's dataset) is absolutely chugging at only 8 billion objects 😰
1
2
Replying to
That's awesome! Really happy to hear people are benefiting from what we do.
1
1
Replying to
Are there plans to update the bulk reddit files at files.pushshift.io/reddit/ ? Looks like most/all of 2020 is missing and these are my main ingest source :)
1
3
Replying to
Yes. So to give some background, the last couple months have been very difficult since we have been moving our infrastructure into the cloud. We are going to start maintaining the monthly dumps again and get caught up as soon as possible. Realistically I would say that we
1
4
should get completely caught up by late spring of this year and then maintain them regularly from that point forward.
2
3
We are also overhauling the ingest pipeline so that the API never falls behind by more than 30 seconds or so and that will be in place very shortly.

