. As always, I really love the work that you are doing. I wanted to ask you a question. I have a high enough rate limit for the Perspective API to do real-time scoring on the data as it comes in. I am writing the code now for scoring Reddit, but I'm unsure on how..
Conversation
.. to pre-process the Reddit comments before scoring the text. I was thinking two main things would be to remove links from the text and also to make sure that any quoted text is also removed (if a child comment author is quoting the parent comment.
Any suggestions on this?
3
Replying to
Also, removing quoted text is a neat idea, but I wonder if such text contains semantic information relevant to the Perspective API? E.g. if someone uses racial hate speech, then another user quotes it (and agrees with it), then they are diffusing the hate speech...
2
1
Replying to
Great response! I really appreciate your input. I think I will reach out to their dev team and ask them how best to handle these type of situations. As you said, Perspective may account for URLS and do something special for them.
Replying to
No probs at all - happy to chip in with some ideas. Keen to hear how it goes!
1
The API has a field for "context," where one can put quoted text or the thing they reply to. A year ago it was ignored, but maybe that changed? Also there was a 3k char limit to comments: quotes/links could eat up much of that. Removing double whitespace and NPCs helps there.
1
1
Thank you! I'm getting ready to score Reddit data -- I wasn't aware of the length restrictions. How would you handle that? We could either not sore the cmment or we could truncate it from the beginning point
1
1
Show replies


