A new paper from MIRI's Vanessa Kosoy: “Delegative Reinforcement Learning: Learning to Avoid Traps with a Little Help.”https://intelligence.org/2019/04/24/delegative-reinforcement-learning/ …
-
Show this thread
From the abstract: "Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor."
5:43 PM - 24 Apr 2019
0 replies
2 retweets
2 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.