Very exciting times for researchers focused on better understanding how social media influences our culture, politics and society in general. At the current rate that Pushshift is ingesting data, we'll hit the one trillion social media objects mark somewhere around September of
Conversation
this year. The total space for all this data is around 3 petabytes highly compressed. Once we get to the one trillion milestone, it will be around 5 petabytes compressed and around 75-100 petabytes uncompressed. (Zstandard compression is amazing)
Replying to
(With zstandard compression, we average around 5-7% of the original size for the data.) So each petabyte highly compressed represents around 15-20 petabytes uncompressed.
3
