Sometimes we just need more than a 1% sample. This #dataset is a >33% firehose during the US/NK summit. 7:30pm ET until 12:30am. This has A TON of good news sources (user objs). Approx ~1,200 tweets per second over a five hour period.
files.pushshift.io/twitter/TF_USN
#datascience
Conversation
This file is 8,415,064,076 bytes compressed and contains slightly more than 33.3% of all tweets made during the time-period. Since retweets have embedded original tweets, if something was retweeted just once, there is a 50% chance of getting that tweet.
