Some major new functionality for big data enthusiasts and academic researchers regarding the Pushshift API. The new API version is still undergoing testing and is in beta. Please check this post to see a list of all the parameters including new ones to help detect bot activity.
Conversation
reddit.com/r/pushshift/co This is the link to the (partial) documentation. There is a new endpoint available that allows you to query subreddit data directly! I have loaded 400,000 of the largest subreddits and I am currently ingesting all available subreddits. So far, there
1
1
2
are 3.5 million subreddits queued to be imported into the back-end elasticsearch nodes. With the new subreddit API endpoint, you can search subreddits based on description and sort by subscriber count. Also, for future dumps, the subscriber counts for subreddits will be
1
2
included in the submission dumps. The value is a real-time value, so using the retrieved_on time for each object, you will be able to track the growth of subreddits over time. There are a lot of new parameters so feel free to ask for clarification! #bigdata #datascience #reddit
