Why doesn’t RL use more toy tasks to measure advances in specific aspects of a problem like long term planning, large action spaces, imperfect information, etc? Complex environments such as Starcraft are impressive but make it difficult to disentangle *why* an agent wins.
-
-
More recently there's https://arxiv.org/abs/1810.08272 by
@DBahdanau and some MILA folk. Visually/conceptually simple language-conditional env, difficult tasks. -
It goes without saying there are probably many more papers which tackle similar problems, but either are consigned to obscurity or rejected by superficial reviewers because they don't "look" hard enough.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
