I'm starting to think loss is harmful. Our loss has been a flat 2.2 for the past five days training GPT-2 1.5B. Yet according to human testing, it's been getting noticeably better every day. It's now good enough to amuse /r/dota2:https://www.reddit.com/r/DotA2/comments/ei505v/ive_been_working_on_an_rdota2_simulator_its_like/ …
-
-
The kind of ad hoc analysis you'd do with DeOldify/GPT-2 is (a) not feasible for the general problem space and (b) impossible to scale without reference to _some_ kind of metric. Absolutely agree that we should be wary of our metrics, but they seem like the only game in town!
-
DeOldify and GPT-2 are two examples of the general problem space. They’re typical, not special. As for reference to some metric, I think people get confused because loss is how a model improves. Loss isn’t quality, though. That can’t be measured except with human tests.
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.