Update on Pushshift Reddit API ingest. Today there was an issue that caused approximately 15 hours of data to be skipped. The ingest code was written in such a way to specifically avoid any data gaps, so something else may have happened deeper in the system (Elasticsearch, etc.)
Conversation
That said, tomorrow I will be spending most of the day working to finally move the new ingest code into production as well as filling any gaps in the data from today. I sincerely appreciate the reports I received today alerting me to this issue.
The goal by tomorrow evening
Replying to
will be to move the new and improved ingest logic into production and include additional monitoring logic to avoid future occurrences of this bug. I greatly appreciate the feedback and your patience as I work to fix these issues tomorrow.
Thank you again!
2
