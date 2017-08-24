Took me too long to find the source research and datasets for Perspective but now hungrily reading. https://conversationai.github.io/
I suspect that with a better source corpus their process could be more useful.
Yeah. I don’t know from Crowdflower, but we already know how Wikipedia’s editors skew. A weird side effect of where you can get “free” data.
Yes! I use Wikipedia as an example of how CC-licensed works can be appealing as training data but can bias AI: https://papers.ssrn.com/abstract_id=3024938 ….
Of course! And I'd love to hear any thoughts or comments you have - I'm so glad other folks are thinking about this too.
At the Internet Archive, I was concerned how much only giving people access to public domain stuff also gives us 1920s mentality stuff
Meaning, of course, that PD representations of women and people of color and disabled people are often horrible & gay ppl non-existent.
Trying it with some visible/invisible disabilities. The man/woman division is concerning. https://www.perspectiveapi.com/ pic.twitter.com/6zVb8v8b4O
Thanks to
@crispycrise for the nudge.
Please tell us more. What are you methods? What's the aim of the research?
Just a librarian nerding about with a free tool. Was poking around b/c of a Wired article & being defensive about VThttp://digital.vpr.net/post/making-sense-data-news-are-vermont-commenters-actually-most-toxic …
The larger project I was curious about is a legit research thing and worth looking at. Main page here. https://conversationai.github.io/
Is this the tool you used for the test you posted earlier? Interesting to see how ratings were created.
Yes. It all grew out of a Wired article that "rated" the toxicity of comments by state using this very beta tool.
Don't get me wrong, i don't support it - but doesn't it reflect the current perceived average opinion? So it's not right, but it seams true
Totally, and that what makes it so pernicious, it feeds into confirmation bias and so actual assumptions aren't as tested as they could be.
I have no idea about the inner workings, but couldn’t it also be that “gay” and “black” are more often used pejoratively than “man”?
I'm sure that's true, but if you know that, there should be a way to correct for it so people can talk about themselves w/o the toxic label
Absolutely agreed! It's a flaw in the software. But it's also very early days (says so the website) so jury is out on what it can do.
Website w/ API on it is a lot more realistic about what it is (and is not) good for. Wired was the one who jumped the gun & drew conclusions
