I just deleted my re-tweet of some claims about GPT and reddit. @jackclarkSF just informed me that it was misleading.
It's a complex challenge: training systems to understand human language without contaminating them with some of the uglier parts of humanity.
-
-
So on reddit we did some pre-filtering to filter out a bunch of subreddits. Then we also did some analysis on the dataset - I think some of the ways reddit shows up are in loads of D&D domains getting ingested, etc. Top1000 domains in gpt2 here:https://github.com/openai/gpt-2/blob/master/domains.txt …
-
With
@OpenAI's effectively bottomless pockets, it seems you could do better. Why didn't you?
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.