"Deep RL doesn't work yet" has an excellent collection of reward hacking examples - see section "Reward function design is difficult".https://www.alexirpan.com/2018/02/14/rl-hard.html …
-
-
Here is the master list spreadsheet https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml … and the Google form for submitting suggestions https://docs.google.com/forms/d/e/1FAIpQLSeQEguZg4JfvpTywgZa3j-1J-4urrnjBVeoAO7JHIH53nrBTA/viewform … and a short accompanying blog posthttps://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/ …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.