Occasionally, when you run some analysis on data, you get back unexpected results. For instance, I wanted to get the lowest avg comment score by subreddit. What I got back were subreddits that were all spam. April 1-10, 2019.
#dataviz #datascience
Conversation
If we set a new filter to exclude subreddits that have less than 10 distinct authors that contribute, we get out of spam territory. There appears to be a lot of porn related subreddits in the lowest 20 avg scoring subreddits.
1
3
Reddit could probably create a spam filtering system simply by flagging subreddits where the previous 100 comments had a very low author cardinality (usually always less than 5 for spam subreddits).
