An idea that I've been thinking about implementing is designing endpoints for other data engineers that will allow them to send data to Pushshift for permanent archival and indexing. The goal being to help crowd source data ingest projects and create a centralized repo for data.
Conversation
Replying to
This is a great idea, actually. But to do this, you should build trust that Pushshift is able to save and manage the data correctly. The reason being data is such a delicate good to handle.
4
1
Replying to
Yep! I agree 100%. Basically, the system should be designed so that when someone posts an array of data to the endpoint, once they receive that 200 response back from the API, the data is already temporarily backed up with redundancy.
That's a very important component.
Or maybe work as a service layer (not sure about terminology) so that ultimately the data is saved on a third party server? I.E. in NZ we strongly prefer the data to be physically stored here.


