Pushshift will be adding a full 4chan API in the next several weeks as well as creating daily / monthly dumps for 4chan data for researchers to use.
Since 4chan has a lot of controversial posts, we cannot provide media dumps along with the textual data. If you are doing ...
Conversation
... research that necessitates the inclusion of media, please reach out to me for some options. The biggest issue is the legal exposure surrounding the storage / dissemination of media that could possibly contain illegal content, etc.
Replying to
4plebs maintains a list of known-bad MD5 hashes you can use to pre-empt collection of CSAM, though it's an imperfect solution since it's only as good as prior experience.


