Do you want to restrict to a specific timeframe? We could look at the last month's worth of posts and see what the top X subreddits were based on number of posts, put all the post ids into an array and then get X random out of them. Would be easier if the time-frame was ...
Conversation
Replying to
I'm about to create those dumps shortly -- they're almost done. Would Friday work for delivery?
1
1
Hey if you’re happy with sampling from the existing pushshift comment dumps I should be able to help with this. Do you need entire comment threads or just n individual comments?
2
Thanks so much , appreciate the offer! I wouldn't want to burden you with this sampling / data wrangling task though... It's a bit of fiddly/tricky problem. [1/2]
1
We want to get ~1000 posts from popular subreddits, but sample in such a way that we the posts contain a mix of highly downvoted, highly upvoted, and 'controversial' comments. But perhaps if you're interested in a collaboration, let me know!
1
2
Are you planning to use my monthly dump files?


