Pushshift currently has over 350 terabytes of social media data relating to the Coronavirus pandemic. We're adding approximately 5-10 terabytes per day.
#coronavirus #bigdata
Conversation
This Tweet was deleted by the Tweet author. Learn more
Right now the main emphasis is on just collecting the data and being as thorough as possible. For the data that is indexed, we're using Postgres and Elasticsearch clusters.
Currently, I don't have enough compute resources to index all of the data but eventually that's the plan.
