psycopg2.DataError: unsupported Unicode escape sequence
LINE 1: ...rified":true}}'),(997288960132083715, 15266071...
^
DETAIL: \u0000 cannot be converted to text.
What kind of jerk sticks null bytes in their tweets?
Conversation
Replying to
I had to put in some extra logic to remove them. PostgreSQL doesn't allow null bytes at all, so it throws an error when it encounters one. I will say that if you plan for a dozen ways that something can fail, it will find a thirteenth!
1
3
Show replies
Replying to
and some of them tweets don't even follow English grammar :) But I hear you. I once spent a couple of hours debugging a Reddit dataset (thank you for the raw data, btw) that took ages to process.
2
1
Replying to
It's a lot of data! Reddit now has over 4 billion comments in total. That's A LOT of data.



