If any software engineers that enjoy a little challenge have some time, we'd like to ingest the entire forum history of officer_dot_com. I've started by including some basic code examples to get going. If you would like to help tackle this, I'd be happy to jump on a call with you
Conversation
Here is the public repo with a very basic example of grabbing metadata for each forum (there is more metadata that needs to be added but this should get you started).
github.com/pushshift/offi
1
1
4
The first step is to parse all the available metadata for each forum (which is already started in the script under the method "get_forum_data" Once we get all the metadata for all forums, then we iterate through each forum to get all threads. Then we iterate threads to get
1
the actual forum posts. So to recap:
1) Finish the get_forum_data method to add all available metadata (Chrome console is your friend here).
2) Create get_thread_data method.
3) Create get_post_data method.
1
My estimate is that if you have experience with scraping, this can probably get knocked out in 2-4 hours. I can supply whatever resources you need.
