The reason I ask this question is that the highest scores in HATETRIS were obtained by exploiting the fact that the HATETRIS AI is a fixed target and completely deterministic.
-
-
Show this thread
-
I suppose the highest possible score in chess is 1 for a win, so that changes things a bit
Show this thread
End of conversation
New conversation -
-
-
AlphaZero hasn't lost once against Stockfish though.
-
Draw, then
End of conversation
New conversation -
-
-
The CN deterministically produces values for each possible move; if your move selection is 'max', then it plays deterministically. The tree search used in playing (not MCTS) would similarly be deterministic. The training games use MCTS so doesn't transfer to regular playing.
-
So if you played the normal CNN+tree-search, you could replay a losing game easily. If you replayed a MCTS game, you'd need to fix the random seed and in practice I imagine their parallelized implementation is not 100% deterministic anyway, so your replay would diverge quickly.
End of conversation
New conversation -
-
-
I’d guess it learns from every game it plays as well.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.