Many people have tried to defend pure DRL w Nature Machine Intelligence article that actually is a hybrid model; the below youtube pointer IS to a pure convnet - but read fine print: “network works decently well for any position less than 6 moves away from solved”
#symbolphobiahttps://twitter.com/AlexRoseGames/status/1186571935611850752 …
it's a way to try, but it's notable that those who have tried fail or use MCTS and sometimes A* as well, presumably because RL on its own fails.
-
-
Every RL algorithm must include an exploration algorithm. MCTS is such an exploration algorithm. A* is not, because it requires a heuristic function, which is not one of the givens for RL problems
-
There is no such thing as "RL on its own" without also specifying an exploration method.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.