Conversation

Replying to
Exploration in standard MDPs is well studied, but what about contextual MDPs (CMDPs) where the environment changes each episode? This general framework captures scenarios such as procgen video games or embodied AI tasks where the agent must generalize across physical spaces.[2/N]
1
5
While exploration in CMDPs has recently started receiving attention, we show that existing methods critically rely on an episodic count-based bonus, and fail if this bonus is removed. This also means they fail in complex envs where each state is seen at most once. [3/N]
Image
1
4
An alternative idea could be to count handcrafted features extracted from states, but this relies heavily on prior knowledge. We show that while this can be effective in some cases, it is difficult to design a feature extractor which works well across many tasks. [4/N]
Image
1
1
To address this limitation, we propose Exploration via Elliptical Episodic Bonuses (E3B). E3B uses an elliptical episodic bonus, which generalizes count-based episodic bonuses to continuous state spaces, paired with a feature extractor learned with an inverse dynamics model.[5/N]
Image
1
3
E3B sets a new SOTA on 16 challenging sparse-reward tasks from the MiniHack suite. In particular, it does so without requiring any feature engineering or task-specific prior knowledge. [6/N]
Image
1
7
We also evaluate E3B for reward-free exploration on Habitat, which provides photorealistic simulations of real indoor environments. Here, E3B outperforms existing methods by a wide margin. [7/N]
Image
1
3
Show replies