In response to paper on missing data from my original Reddit data dumps, I have worked hard to fix the issue and to make the data dumps as complete as possible. I will be uploading the new data later tonight. More details are here:
Conversation
There were a substantial amount of submissions missing from the original data dumps from before November, 2007. I will be working to fill in the gaps for comments but this will take longer since there are a lot more comments than submissions.
1
3
I would like to thank and for their excellent work in pinpointing the deficiencies in the data -- their contributions were essential for helping to find the problematic areas.
Thank you Alex! It's a huge project and having people like and assist with it is a huge help. There's a lot more work left to do but this addition fixes some massive issues for submission analysis from Reddit's earlier days.


