Currently ingesting 5,000 tweets per minute (7,200,000 per day. I'll let this run for four weeks to collect ~ 200,000,000 tweets that are politically based. Then we'll see just how many bots are messing up Twitter.
#datascience #bigdata #datasets
Conversation
Replying to
Can you estimate the size of the dataset? Is it something you could make publicly available? Host it on for example? :)
1
1
I can make it available for "academic use" so just say it's for academic study. Once it is complete, I'll make an announcement with a link to the data. Also, if you know of a source where I can get a list of all Twitter accounts for all senators, reps and governors, that ...
2
1
Show replies
Replying to
3
1
I'll have to read up on that. One method I have used with Reddit data is tracking the average response time to comments. Take a look at this API call that aggregates all authors that have a response time of 10 seconds or less. They're all bots:
beta.pushshift.io/reddit/comment
1




