One additional point about the new data -- you will notice scoring differences for submission objects between the old version and new version of data. This is due to how Reddit scored submission and comments. They previously applied a "fudge factor" to the scores. The new ...
Conversation
data (RS_v2_yyyy-mm format) has a more accurate representation of the true scores. I am currently compressing the files and will have them uploaded later this evening (files.pushshift.io/reddit/submiss)
