I should have the #kavanaugh Twitter dataset ready for download on Friday. I am doing some sanity checks on the data and prepping it for the dump. I'll have a more accurate number for the amount of tweets in a day or two.
#datasets #bigdata #datascience
Conversation
The current size of the dataset uncompressed is around 375 GB of tweets. That should be over 100 million tweets.
2
1
Replying to
Will there be handles or bios attached to each? I was thinking I would use keywords like maga or resist, etc in those to act as a kind of a negative/positive labeling method.
1
1
Replying to
Yes indeed! I hope you are willing to share any insights you discover within this massive data dump!

