An idea that I've been thinking about implementing is designing endpoints for other data engineers that will allow them to send data to Pushshift for permanent archival and indexing. The goal being to help crowd source data ingest projects and create a centralized repo for data.
Conversation
Replying to
Other data engineers could use these endpoints to POST data they have collected and send that data in bulk to Pushshift where it can then be indexed and distributed to other researchers.
This would allow rapid prototyping of new projects that involve big data.
Replying to
This is a great idea, actually. But to do this, you should build trust that Pushshift is able to save and manage the data correctly. The reason being data is such a delicate good to handle.
4
1
Replying to
Yep! I agree 100%. Basically, the system should be designed so that when someone posts an array of data to the endpoint, once they receive that 200 response back from the API, the data is already temporarily backed up with redundancy.
That's a very important component.
1
Show replies

