To the left, you see a trained agent playing a level of a game. To the right, you see the same playthrough from an agent-centric perspective: cropped, translated, and rotated with the agent in the center. Which perspective is the best input for the agent? https://arxiv.org/abs/2001.09908 pic.twitter.com/7bCtBp8xUG
-
Prikaži ovu nit
-
In our new paper, "Rotation, Translation, and Cropping for Zero-Shot Generalization" by
@yooceii@Amidos2006@FilipoGiovanni and I show that the agent-centric perspective is better in the sense that the agent learns policies that generalize better. https://arxiv.org/abs/2001.09908 pic.twitter.com/q7jLjBRjLL
2 proslijeđena tweeta 32 korisnika označavaju da im se sviđaPrikaži ovu nit -
We use a standard deep network architecture and reinforcement learning algorithm (A2C). When agents are trained using the top-down view which is the game's "native" view (to the left in the inital gif) the trained networks play unseen levels very badly.
1 reply 0 proslijeđenih tweetova 10 korisnika označava da im se sviđaPrikaži ovu nit -
Simply cropping the level to see only what's around the agent, or rotating it so that the agent is pointing up, or translating it so that the agent is always in the center of the image, have little effect on their own. But all three together drastically increases generalization!
1 reply 0 proslijeđenih tweetova 21 korisnik označava da mu se sviđaPrikaži ovu nit -
Note that in all six conditions, the algorithm is able to learn policies that work on the specific level(s) it is trained on. What really differs is the generalization capacity.
1 reply 0 proslijeđenih tweetova 8 korisnika označava da im se sviđaPrikaži ovu nit -
Julian Togelius je proslijedio/a tweet korisnika/ceJulian Togelius
This paper builds on earlier work, where we showed that standard deep RL algorithms learn policies that generalize very badly. They are barely able to play any levels that they were not trained on at all.https://twitter.com/togelius/status/1012726654261702658 …
Julian Togelius je dodan/na,
Julian Togelius @togeliusDeep reinforcement learning overfits. Often, a trained network can only play the particular level(s) you trained it on! In our new paper, we show how to train more general networks with procedural level generation, generating progressively harder levels. https://arxiv.org/abs/1806.10729 pic.twitter.com/JDNYPZuAHVPrikaži ovu nit1 reply 0 proslijeđenih tweetova 10 korisnika označava da im se sviđaPrikaži ovu nit -
It is well known from "non-deep" agent learning research (say, evolutionary robotics work and game AI work from early to mid 2000s) that the sensor representation is extremely important to the agent's ability to learn. http://julian.togelius.com/VanHoorn2009Hierarchical.pdf …pic.twitter.com/Ugj9IDsdDR
2 proslijeđena tweeta 22 korisnika označavaju da im se sviđaPrikaži ovu nit -
The promise/premise of deep learning is that we don't have to worry about these representations, because the network will figure out the input representation itself. But, really, will it? Will a network of a few layers really learn to rotate and translate to focus on the agent?
0 proslijeđenih tweetova 15 korisnika označava da im se sviđaPrikaži ovu nit -
Also, can it? It is hard to imagine that a neural network of just a few layers could actually implement the transformations necessary to even understand where things are relative to the agent, so that the policy can be location-independent?
1 reply 0 proslijeđenih tweetova 9 korisnika označava da im se sviđaPrikaži ovu nit -
It is possible that the standard paradigm of a neural network with a handful of layers learning to master, say, Atari games from a static third-person view is actually impossible. That is, it doesn't learn any general playing skills. It learns some kind of stimulus-response table
1 reply 0 proslijeđenih tweetova 12 korisnika označava da im se sviđaPrikaži ovu nit
In any case, even if this is possible in principle, it seems that the way we represent the input makes a lot of difference for the generality of skills that can be learned in practice.
-
-
Really appreciate the summary in tweets here about this effect! Quickly digestible and something I think I'll bring into my AI class as well. Nice work!
0 replies 0 proslijeđenih tweetova 2 korisnika označavaju da im se sviđaHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.