Pushshift has now reached a new milestone. The Coronavirus social media database now has over half a petabyte of data! Over 550 terabytes across 30 billion social media objects.
Conversation
Ah super interesting -- may be relevant to chat about ProjectDomino.org as part of putting it to good use! We've been ingesting COVID Twitter, and were talking about COVID Reddit even today, but had deferred so can put that time on analytics + interventions instead!
6
Replying to
Awesome! I can't wait to see the latest offerings! I've been so busy that I haven't played with data in my free time as much as I used to :(
Replying to
Wow, that's a big ElasticSearch cluster? Who is going to use it, and how?
1
Replying to
All those objects aren't indexed (I wish!) Only a small portion is currently indexed in Elasticsearch. Most of the data it sitting in compressed hourly / daily / dumps.
1
Show replies





