Excited to share our work: collaboration requires understanding! In Overcooked, self-play doesn't gel with humans: it expects them to play like itself. (1/4) Demo: https://humancompatibleai.github.io/overcooked-demo/ … Blog: https://bair.berkeley.edu/blog/2019/10/21/coordination/ … Paper: https://arxiv.org/abs/1910.05789 Code: https://github.com/HumanCompatibleAI/overcooked_ai …pic.twitter.com/lqbzeTwoqr
-
-
We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4)pic.twitter.com/v1ykAkLpkE
Prikaži ovu nit -
Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4)pic.twitter.com/FmR9Mn2Xwx
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
-
-
Note that algorithms like max^n are more appropriate for n-player or non-zero-sum games. We looked at a similar opponent modeling problem algorithmically about 10 years ago designing better versions of max^n for these sorts of situations (eg https://www.cs.du.edu/~sturtevant/papers/softmaxn.pdf …)
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.