Probably one of the more important DL papers of the last 5 years: it shows that the DL community has been good at rushing to flag-plant by creating flashy new models, but terrible at evaluating them by training good baselines. Can you trust YOUR model’s results in <insert task>?https://twitter.com/GaborMelis/status/887565474317381632 …
If this is truly the source of the problem, then we should see a different style of research on GANs where baselines are more subjective.