Conversation

Data scientists of Twitter -- would a daily Reddit dump for comments and submissions (up by 6am the following day) be helpful for your research / projects? If there is enough desire for this, I'll make it publicly available to everyone who would like access.
14
45
Replying to
That would be amazing. It would lower the bar for people to use the dumps (300GB json files), and allow large-scale tracking by having 2 snapshots of the same document (daily and monthly dumps).
1
1
Replying to and
That said, I'm a bit worried by the potential for misuse: anyone could then compare the dumps to find and recover all content deleted in that timeframe in a way that is prevented today by API rate limits.
1
1
Replying to and
The new authors/new subreddits daily dumps don't have this issue though, and could spawn some really interesting projects (e.g. early detection of misinformation subreddits for real-time tracking).
2
Replying to
Good points. I think for the daily dumps, we could do it as a subscription service for bonafide academic research or other non-profit / NGO type activity. That would prevent a lot of misuse down the road. We're working on an account management system and once that is done, we
1
1
Replying to
That would be great. The dumps allow a type of fully-sampled, large-scale research that is simply not possible with any other big social media platform. Looking forward to it =D
1