I have combined all of my ingests into a standard pipeline and now have a new endpoint: beta.pushshift.io/url/search
This endpoint gives every url seen from Twitter, Reddit and Gab -- profile pics, external links, etc. Between 500-1500 links a second are added.
#bigdata #datascience
Conversation
This tracks the top 10,000 Twitter accounts and also uses two live streams to gather urls mentioned on Twitter. Every Reddit post is and comment is also scanned for links. Also, all Gab posts are also scanned. This runs within Redis but the goal is to make it permanent.

