I've always felt that rewards/RL oversimplify behaviour. Maybe now there's a shift back to "Planning by Probabilistic Inference" (AISTATS, 2003): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.9135&rep=rep1&type=pdf … Presents the simple idea that we condition actions on goals, with a desired return as a possible goal.
-
Prikaži ovu nit
-
It's a beautiful framework for thinking about decision making in general, but you can have your POMDPs and rewards for structure. But if the community measures progress by gaming reward functions, how can we accept research in this area?
1 reply 0 proslijeđenih tweetova 3 korisnika označavaju da im se sviđaPrikaži ovu nit -
We shouldn't be afraid to develop benchmarks based on goals. It still allows us to explore, like in disentanglement research, the full spectrum from fully-supervised to unsupervised discovery of goals. Particularly inspiring is NLP models conditioned to perform very diff. tasks.
1 reply 2 proslijeđena tweeta 4 korisnika označavaju da im se sviđaPrikaži ovu nit
As to why I feel the timing is ripe? It's been a common thread of work from @dileeplearning @vicariousai, part of the fresh "upside-down RL" from @rupspace @SchmidhuberAI @nnaisense, and very recently appearing from @svlevine.
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.