Almost ♾ unlabeled data is the “secret sauce” for today's ML, but how do we use uncurated datasets in robot learning?
Conditional Behavior Transformer makes sense of "play" style robot demos w/ no labels and no RL to extract conditional policies!
Play-to-policy.github.io 🧵
Conversation
Replying to
C-BeT can take a continuous demo with multi-modal actions without special directions or labels, and learn a multi-modal 📷 or 🎞️-conditioned policy that solves long-horizon tasks using image obs!
E.g. all these tasks on our play kitchen are learned from one 4.5 hour long demo.
1
2
13
Our method is similar to "prompting" practices for GPT — we train transformers that predict multi-modal distributions over actions given sequences of current+future frames.
During eval we condition on current + desired env frames, and it just works — 0 finetuning required 🔥
1
1
11
Pre-trained BYOL features for our visual observations mean the behavior model takes only 4 hours to train ⚡ And we don't have to worry much about random visual obstructions, as you can see here: 🪴🐵🧊
1
1
10
TL;DR: C-BeT trains fully offline, with NO reward fn./labels, and easily consumes suboptimal/mixed demos 🔥 It's like goal conditioned BC but for the real world.
More videos & code: play-to-policy.github.io
Paper: arxiv.org/abs/2210.10047
Lead by w/ &
1
1
15
Replying to
Very nice work! One challenge in learning from play data that we saw in LfP are the biased + highly multimodal action distributions. Great to see that C-BeT is robust enough to predict action distributions and use that understanding to generalize!
1
5
Replying to
Surprising to see it learn so easily, should we expect GPT-3 like leaps in RL now?
1
Replying to
Great work ! Love seeing more people work on leveraging play data. We present our take on it in our upcoming CoRL paper: tacorl.cs.uni-freiburg.de
1
2
8





