Medijski sadržaj
- Tweetovi
- Tweetovi i odgovori
- Medijski sadržaj, trenutna stranica.
-
Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4)pic.twitter.com/FmR9Mn2Xwx
Prikaži ovu nit -
We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4)pic.twitter.com/v1ykAkLpkE
Prikaži ovu nit -
In competitive games, the minimax theorem allows self-play to be agnostic to its opponent: if they are suboptimal, SP will crush them even harder. That doesn’t work in collaborative games, where the partner’s suboptimal move and SP’s failure to anticipate it will hurt. (2/4)pic.twitter.com/6I6KwLOp0Z
Prikaži ovu nit -
Excited to share our work: collaboration requires understanding! In Overcooked, self-play doesn't gel with humans: it expects them to play like itself. (1/4) Demo: https://humancompatibleai.github.io/overcooked-demo/ … Blog: https://bair.berkeley.edu/blog/2019/10/21/coordination/ … Paper: https://arxiv.org/abs/1910.05789 Code: https://github.com/HumanCompatibleAI/overcooked_ai …pic.twitter.com/lqbzeTwoqr
Prikaži ovu nit -
We developed Reward Learning by Simulating the Past (RLSP), and created a suite of gridworlds to showcase its properties. The top row shows what happens with a misspecified reward, while the bottom shows what happens when using RLSP to correct the reward. (4/4)pic.twitter.com/0OS1OPW1gu
Prikaži ovu nit -
We've released our work on inferring human preferences from the state of the world! By thinking about what "must have happened" in the past, we can infer what should or shouldn't be done. Blog post: https://bair.berkeley.edu/blog/2019/02/11/learning_preferences/ … Paper: https://openreview.net/forum?id=rkevMnRqYQ … (1/4)pic.twitter.com/MKlROUdQBS
Prikaži ovu nit
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.