psycopg2.DataError: unsupported Unicode escape sequence
LINE 1: ...rified":true}}'),(997288960132083715, 15266071...
^
DETAIL: \u0000 cannot be converted to text.
What kind of jerk sticks null bytes in their tweets?
Conversation
Replying to
and some of them tweets don't even follow English grammar :) But I hear you. I once spent a couple of hours debugging a Reddit dataset (thank you for the raw data, btw) that took ages to process.
2
1
Turns out one Reddit comment had many thousands of Unicode emoticons, and this completely brought the NLP package I used to its knees. It tried all the various ways to tokenize that thing.
1
2
Replying to
Yes! Reddit is the perfect dataset for when you think you have all your bases covered because there is always something somewhere in the data that will force you to rethink NLP, etc.

