Big news! Today we (, , , , @ncri?) have released a paper about the Pushshift Reddit Dataset! This is my first step into making a more direct contribution to science and I couldn’t be more excited to share.
Paper: arxiv.org/pdf/2001.08435
Conversation
Replying to
1/ While the paper includes a DOI that points to a sample of the data, it’s important to note that we make billions of posts and comments available via a searchable API via pushshift.io. Just looking at the volume of data shows how much Reddit has grown over time!
1
2
8
2/ Releasing this paper is the culmination of 5 years of effort. I started Pushshift as a side project, and while I hoped that people would find it useful, I never imagined the kind of impact it would have. Over 100 peer reviewed papers have relied on Pushshift for data!
1
1
13
3/ Pushshift is a community of researchers, data journalists, and citizen scientists. Over the years, we’ve built up an active subreddit, and fostered a collaborative Slack workspace, even building tools that others can use in their own collaborative groups.
1
1
9
4/ . is currently coordinating a group of talented people building an easy to use app version of many of the tools we used in this academic research which they are sharing in beta today. -media-analysis-toolkit.herokuapp.com
1
3
10
5/ Pushshift is not alone in its mission and service to the community. There are other great services that we took inspiration from, like , and they are invaluable resources for anyone that does data science.
1
2
12
6/ In the end, what started as a side project has grown to be much more. This paper demonstrates our commitment to the community, both in continuing to provide the services that have pushed forward the frontier of science, but also our ongoing efforts to improve Pushshift!
5
uhh ... whoops! I got overly excited and meant in the first tweet. Sorry about that!
2
1


