Conversation

Replying to
Basically, each new day you would see four new files. A file for all comments from the previous day, the same for submissions and an author file with all the new authors seen from the previous day (with their creation date) and lastly a file for all new subreddits seen.
1
8
Replying to
Yes this is an awesome dataset to keep track of & work on, thank you. I checked the previous dump! often a subreddit is the starting point of memes or other internet trends. This reddit dataset + tw-i-tt-er dumps can do wonders to find fraud (mostly crypto) and fake-news trends.
1
3
Replying to
Awesome! I'm always super excited to hear when others are finding these data dumps to be valuable for their research / project / testing / etc. I'll update everyone once the daily dumps are ready to run in production!
1
3
Show replies
Replying to
That would be amazing. It would lower the bar for people to use the dumps (300GB json files), and allow large-scale tracking by having 2 snapshots of the same document (daily and monthly dumps).
1
1
Replying to and
That said, I'm a bit worried by the potential for misuse: anyone could then compare the dumps to find and recover all content deleted in that timeframe in a way that is prevented today by API rate limits.
1
1
Show replies