I should have the #kavanaugh Twitter dataset ready for download on Friday. I am doing some sanity checks on the data and prepping it for the dump. I'll have a more accurate number for the amount of tweets in a day or two.
#datasets #bigdata #datascience
Conversation
The current size of the dataset uncompressed is around 375 GB of tweets. That should be over 100 million tweets.
Replying to
Will there be handles or bios attached to each? I was thinking I would use keywords like maga or resist, etc in those to act as a kind of a negative/positive labeling method.
1
1
Show replies


