Data scientists of Twitter -- would a daily Reddit dump for comments and submissions (up by 6am the following day) be helpful for your research / projects? If there is enough desire for this, I'll make it publicly available to everyone who would like access.
Conversation
Replying to
That would be amazing. It would lower the bar for people to use the dumps (300GB json files), and allow large-scale tracking by having 2 snapshots of the same document (daily and monthly dumps).
1
1
That said, I'm a bit worried by the potential for misuse: anyone could then compare the dumps to find and recover all content deleted in that timeframe in a way that is prevented today by API rate limits.
1
1
The new authors/new subreddits daily dumps don't have this issue though, and could spawn some really interesting projects (e.g. early detection of misinformation subreddits for real-time tracking).
2
Replying to
Good points. I think for the daily dumps, we could do it as a subscription service for bonafide academic research or other non-profit / NGO type activity. That would prevent a lot of misuse down the road.
We're working on an account management system and once that is done, we
can better vet users and make sure that we keep any abuse of the data down to a minimum (ideally keeping it non-existent -- but unfortunately the world we live in would make that next to impossible).
1
Replying to
That would be great. The dumps allow a type of fully-sampled, large-scale research that is simply not possible with any other big social media platform. Looking forward to it =D
1

