Pushshift has now reached a new milestone. The Coronavirus social media database now has over half a petabyte of data! Over 550 terabytes across 30 billion social media objects.
Conversation
Replying to
Wow, that's a big ElasticSearch cluster? Who is going to use it, and how?
1
Replying to
All those objects aren't indexed (I wish!) Only a small portion is currently indexed in Elasticsearch. Most of the data it sitting in compressed hourly / daily / dumps.
1
Replying to
At some point (given enough funding) it would technically be feasible. That would probably require a rack of server equipment with a few row of blades. It could probably be done with a quarter million dollars.

