I should have the #kavanaugh Twitter dataset ready for download on Friday. I am doing some sanity checks on the data and prepping it for the dump. I'll have a more accurate number for the amount of tweets in a day or two.
#datasets #bigdata #datascience
Conversation
Replying to
The current size of the dataset uncompressed is around 375 GB of tweets. That should be over 100 million tweets.
2
1
Replying to
Also, each tweet that I ingested has the timestamp of when it was retrieved. This should help people interested in analyzing retweets / likes for the dataset.
