Lots of disclaimers: this is just one metric, there's plenty of shortcomings of the metric. We're very aware of that and discuss this in the essay.https://www.theatlantic.com/science/archive/2018/11/diminishing-returns-science/575665/ …
TD-Gammon (1992) played at expert level using unsupervised reinforcement learning, and the DeepMind team have of course acknowledged that forthrightly. DM do a nice thing, integrating TD-Gammon's approach with Monte-Carlo Tree Search (also an old idea, which DM acknowledge).
-
-
I'd also argue that this 2014 paper already had the essence and key ideas of AlphaZero https://papers.nips.cc/paper/5495-learning-to-search-in-branch-and-bound-algorithms …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.