I should have the #kavanaugh Twitter dataset ready for download on Friday. I am doing some sanity checks on the data and prepping it for the dump. I'll have a more accurate number for the amount of tweets in a day or two.
#datasets #bigdata #datascience
Conversation
The current size of the dataset uncompressed is around 375 GB of tweets. That should be over 100 million tweets.
2
1
Replying to
Will there be handles or bios attached to each? I was thinking I would use keywords like maga or resist, etc in those to act as a kind of a negative/positive labeling method.
1
1
Replying to
The user bios will be included in this dump and a part of every tweet!
I have a tweet classification neural network I built as a side project basically ready to go for it. Except it was built for 140 char tweets, so I'll need a few adjustments.
1
2
Show replies

