If you are looking for Reddit data for all of 2020 and early 2021, look no further. I thought I had to reingest the previous months because of a server failure a while back but I actually had a backup server still ingesting the data. So I was cleaning up today and found over
Conversation
one billion Reddit comments for all of 2020 and early 2021. The script was doing its job happily until the 4TB SSD drive ran out of space -- but the data is there!
I just need to compress 1.2+ billion comments and submissions and the data will be up within a week.
2
24
This will get the archives caught up to around March of 2021 -- the last couple months will be ingested shortly. But hey, pandemic data!
2
26
Replying to
will you be able to re-populate missing days from the production api? There's still a dozen or so scattered over the last year
1
2
Replying to
Yes -- As soon as the dumps are up, I'm going to do a full scan on production and fill any missing gaps and get the new ingest code in place so there aren't any more large delays in prod (beta has been keeping up to date with the new code).

