Reddit March 2022 comments have been fixed and reuploaded. If you discover any more damaged uploads, please let me know!
Conversation
Replying to
Hey Jason, I think you uploaded the wrong file: RC_2022-03 is still the old corrupted version but now there is a new RC_2020-03.
4
I had to decode the latest files as iso-8859-1 as opposed to the usual utf-8. Is that expected?
2
1
I'm not sure if the previous dumps were ASCII safe or not -- I'll have to look at them. I wouldn't expect it to be much of an issue but it is probably best to keep the formatting consistent anyway. I'll take a look and if it is different, I'll process the files again over ...
1
1
... the next week or two. Usually parsers like Python's simplejson / ujson / orjson will automatically handle the encoding.


