Occasionally, when you run some analysis on data, you get back unexpected results. For instance, I wanted to get the lowest avg comment score by subreddit. What I got back were subreddits that were all spam. April 1-10, 2019.
#dataviz #datascience
Conversation
Replying to
These subreddits all had at least 10,000 comments. The problem here is that most of these are comments from the same user. So to get out of spam territory, it might be necessary to require some other filter like a minimum author cardinality.
1
3
Replying to
If we set a new filter to exclude subreddits that have less than 10 distinct authors that contribute, we get out of spam territory. There appears to be a lot of porn related subreddits in the lowest 20 avg scoring subreddits.
1
3
Reddit could probably create a spam filtering system simply by flagging subreddits where the previous 100 comments had a very low author cardinality (usually always less than 5 for spam subreddits).
3
