Reinforcement learning is a paradigm that will eventually be superseded. We just haven't figured out what the new, more generally useful, paradigm is yet. When we do, there's going to be a revolution. It will be very interesting.
-
-
You must have a more general (!) notion of generalization in mind. AlphaGo certainly can generalize to Go positions that it has never seen during training. That is the form of generalization we have been trying to achieve for many years.
-
I don't think task-specific skills are useless. I would love to have an RL system that could make decisions for long-term wildfire management. It is easy to formulate as a (mostly stationary) MDP, but still not practical to solve because the simulator is too expensive
- Show replies
New conversation -
-
-
I'm not anywhere near qualified to butt into this conversation, but wouldn't expecting RL to achieve the generalization that you're implying be analogous to expecting, say, a history professor to also be a physics expert?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Don’t think anything in the RL formulation forces you to have a static reward fn. or action space, and the definition of “agent” is up to you too. Doesn’t need to be static. I agree that the algorithms are inadequate, but don’t see how the problem formulation is limiting.
-
For example, if you defined a meta-learning problem over a huge range of possible environments with fluid agent definitions, that would still be an RL problem. Perhaps not a very useful benchmark for current algorithms, but totally valid.
End of conversation
New conversation -
-
-
Novelty seeking explainable-model-based RL ~ science: model ~ theory novelty ~ experiment Problems: 1. Find ways to perceive unknown reality, qualitatively: new dimensions 2. Be less conformist, challenge and simplify model: "God does not play dice"https://twitter.com/SchmidhuberAI/status/1255755680855793665 …
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Couldn't an ensemble of goals and rewards converge to a generalizable solution? As the number of goals grows so does the "awareness" of the environment.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.