Yes. So to give some background, the last couple months have been very difficult since we have been moving our infrastructure into the cloud. We are going to start maintaining the monthly dumps again and get caught up as soon as possible. Realistically I would say that we
Conversation
should get completely caught up by late spring of this year and then maintain them regularly from that point forward.
2
3
Replying to
Woah thats a big change and I'm sure quite a lot of work!
Your work is greatly appreciated as always
1
1
Replying to
Thanks! Feedback is always helpful -- especially knowing what parts of our service are the most used, etc.
1
1
Replying to
I (and my research partner) are probably an outlier compared to your usual users. We've made our own local database for the sake of throughput/weird queries and use the monthly dumps as our upstream data source on a 6+ month delay. We don't need 2020-21 data yet, so no rush!
2
1
Replying to
Also, when we start filling in the missing monthly dumps, we'll probably work backwards because we especially want to capture (update) all the comment / post scores related to the insurrection in January, the election, etc.
1
1
Replying to
Yes. The Reddit monthly dumps are currently in progress and we should start to have the missing months up within the next several weeks (it may take up to two months to fill in all the gaps but data will go up as it is collected).
1
Replying to
Wonderful! I'm confused though, don't you already have the data within pushshift's database?
1
Replying to
Yes but the monthly dumps that are currently up are always rescanned so that we can include the most recent score data for all comments and submissions. As the API is (usually) near real-time, it doesn't have any real data on the final scores for comments (upvotes).
1
So the re-ingest of all the comments and submissions provide more accurate score data, etc.
Replying to
Oh! I thought the scores were always wrong and was planning to re-fetch myself. That'll save me an incredible amount of time :D
1
1
I've DM'd you a schema question if you have spare minute
2
1
Show replies

