If any software engineers that enjoy a little challenge have some time, we'd like to ingest the entire forum history of officer_dot_com. I've started by including some basic code examples to get going. If you would like to help tackle this, I'd be happy to jump on a call with you
Conversation
Here is the public repo with a very basic example of grabbing metadata for each forum (there is more metadata that needs to be added but this should get you started).
github.com/pushshift/offi
1
1
4
The first step is to parse all the available metadata for each forum (which is already started in the script under the method "get_forum_data" Once we get all the metadata for all forums, then we iterate through each forum to get all threads. Then we iterate threads to get
Replying to
the actual forum posts. So to recap:
1) Finish the get_forum_data method to add all available metadata (Chrome console is your friend here).
2) Create get_thread_data method.
3) Create get_post_data method.
1
My estimate is that if you have experience with scraping, this can probably get knocked out in 2-4 hours. I can supply whatever resources you need.
