This is the main idea of the new AlphaGo paper:pic.twitter.com/a8QmDALslL
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
I do remember wondering why the value & policy net system was required and assuming that it factored the problem in a helpful/necessary way.
But I thought the result of that 2-net factoring was morally equivalent to this, modulo some actual quantities replaced with predicted ones.
They used distinct networks in AlphaGo v1 so that the policy network could be initialized via supervised ...
learning on human expert moves. Policy gradient updates then used on this initialized network.
Then the policy and value networks were used to limit depth/breadth of MCTS.
In AlphaGo Zero policy network is not initialized with SL, so no need to seperate networks.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.