In the next couple days, I will be dumping a massive Twitter collection of all tweets related to Kavanaugh over the past 4 weeks. This data dump will contain tens of millions of tweets that span the confirmation hearings. Total size: > 350 GB
#bigdata #datascience #kavanaugh
Conversation
My hopes are that this data can be used for research and also so that future historians have a very diverse and large dataset of social media material covering the entire confirmation process. If you are interested in a copy of the data, please let me know.
2
6
Replying to
Nice work - that sounds quite valuable as a dataset! :) Out of curiousity, how was it acquired? Traditional web scraping of Twitter or via their API or ..?
1
1
Replying to
Great question! I have been using the Twitter search API and specifically requesting tweets that match any of these:
q='Kavanaugh OR #Kavanaugh OR "Supreme Court" OR #KavanaughHearings OR #KavanaughHearing OR #kavanaughNomination'
Are you removing those which are only using the hashtags for visibility? There have been quite a few - especially today, it seems - that are completely unrelated but still use the hashtags mentioned.


