Currently ingesting 5,000 tweets per minute (7,200,000 per day. I'll let this run for four weeks to collect ~ 200,000,000 tweets that are politically based. Then we'll see just how many bots are messing up Twitter.
#datascience #bigdata #datasets
Conversation
Replying to
Can you estimate the size of the dataset? Is it something you could make publicly available? Host it on for example? :)
1
1
I can make it available for "academic use" so just say it's for academic study. Once it is complete, I'll make an announcement with a link to the data. Also, if you know of a source where I can get a list of all Twitter accounts for all senators, reps and governors, that ...
2
1
Oh, I didn't answer your original question. My guess is that the dataset would be between 20 and 40 gigs compressed. Somewhere around 250 gigs uncompressed?
That's slightly bigger than our current limit (10 Gigs), but you could either split it into two dataset or hit me up and I can help you set up an organization that can upload bigger datasets.
2
You and I and as a whole should get in touch -- I have some really awesome datasets coming up in about two weeks along with the monthly Reddit dumps.
1
3
Show replies


