If you have recommendations for accounts that you would like added to the database so that they are constantly monitored, please let me know!
Thank you!
#bigdata #datascience #datasets
Conversation
Replying to
Non-US politicians? Could lead to interesting cross national comparisons. Requires updating each election cycle though.
1
2
Also, working in a project with @carly_r_knight where we look at the images tweeted by US congresspeople. We've got the data for that, but for future would be great if pushshift archived images from tweets too (if they aren't already).
1
5
That's a great idea -- archiving the media attached to tweets. Definitely worth it. I'll need to see how much additional space that would require but it should be doable.
1
1
If it looks to be too big, perhaps limiting to just still images (rather than video) or just things hosted on Twitter (I think they resize big uploads) would remove the really big stuff and make it manageable?
1
Yeah -- images should definitely be manageable. Once I'm done ingesting the last 3,200 tweets for every verified account, I'll take a look and see what percentage of those tweets contain images / media and run the numbers to see how much space would be needed. Also, I could...
dump them in Google drive where I have "unlimited" storage. That's always an option for long-term archival of data that isn't needed immediately but can still be made available at some point.
A tiny fraction, in our experience, have images. Some have up to 4 images, but most have only one. One benefit to real time (ish) monitoring is that you mostly avoid the problem we had, where image links go dead for some images before we got around to saving them.
1
1
That fraction will probably be very different for media people vs "influencers" vs politicians (our sample) vs something like .
1
1
Show replies

