Opens profile photo
Follow
Devendra Chaplot
@dchaplot
Research Scientist at Facebook AI Research. I work on Embodied AI and Computer vision. Past: Ph.D. , Intern & , BTech CS.
Pittsburgh, PAdevendrachaplot.github.ioJoined April 2010

Devendra Chaplot’s Tweets

Analysis: Why does modular learning work even better in reality than simulation? Because failure modes in sim and reality have almost no overlap! Takeaway: Navigation methods must be evaluated in the real world. (13/N)
Image
1
9
Show this thread
The SemExp policy of the modular learning approach takes a semantic map as input, while the end-to-end policy directly operates on the RGB-D frames. The semantic map space is invariant between sim and reality, while the image space exhibits a large domain gap. Example👇 (12/N)
Embedded video
GIF
1
4
Show this thread
Analysis: Why does Modular Learning transfer well while End-to-end does not? To answer this question, we reconstructed one real-world home in simulation and conducted experiments with identical episodes in sim and reality. (11/N)
Embedded video
GIF
2
8
Show this thread
Key Result 3: End-to-end Learning Fails to Transfer The end-to-end policy collides often, revisits the same places, and even fails to stop in front of goal objects when they are in sight. Examples👇 (10/N)
Embedded video
GIF
1
3
Show this thread
Key Result 2: Modular Learning explores more efficiently than Classical methods This is because Modular Learning is goal-oriented, it learns semantic exploration priors to find the goal object efficiently. Classical Frontier exploration is goal-agnostic. Example👇 (9/N)
Embedded video
GIF
1
4
Show this thread
Example👇: All policies start at the same location and need to find a couch. Left: Modular Learning first reaches the couch successfully. Center: End-to-end learning fails after colliding too many times. Right: Classical policy succeeds after a detour through the kitchen. (7/N)
Embedded video
GIF
1
5
Show this thread
Results: In simulation, all approaches perform comparably, at around 80% success rate. But in the real world, End-to-end learning drops to 23% success rate while Modular learning achieves 90% success! (6/N)
Image
1
3
Show this thread
We conduct the first large-scale real-world empirical evaluation for semantic visual navigation: - 3 methods - 6 unseen homes - 6 goal object categories All methods are trained in simulation and directly transferred to the real world without any fine-tuning. (5/N)
Embedded video
GIF
1
10
Show this thread
Humans can do this effortlessly: when looking for a glass of water at a friend's house we're visiting for the first time, we can easily find the kitchen without going to bedrooms or storage closets. But teaching this kind of common sense to robots is challenging. (3/N)
Image
1
6
Show this thread
We are announcing Habitat-Matterport 3D Semantics Dataset! 216 real-world 3D spaces with dense semantic annotations available to download now: aihabitat.org/datasets/hm3d-
Quote Tweet
(1/3) Today we’re releasing the Habitat-Matterport 3D Semantics dataset, the largest public dataset of real-world 3D spaces with dense semantic annotations. HM3D-Sem is free and available to use with FAIR's Habitat simulator: bit.ly/3EymX8x
Show this thread
Embedded video
GIF
1
51
Our key insight is that “where to look?” can be treated purely as a perception problem and learned without interactions. Our model predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object. 2/4
Image
1
Show this thread
The code for FILM, my ICLR 2022 paper with , , and , is now publicly released as the starting code for the ALFRED challenge held at the #CVPR2022 EAI 2022 Workshop! Excited to see people push the SotA even further github.com/soyeonm/FILM
Quote Tweet
Happy to announce that the 3rd Embodied AI Workshop is happening on June 19th at #CVPR2022. Lots of exciting speakers and 13 (!!!) challenges. Check out our website (embodied-ai.org), dive into a challenge, and save the date!
Show this thread
1
58
Show this thread
Our recent work LEO shows how to differentiate through an optimizer via sampling instead of unrolling. We apply this to learn observation models end-to-end, for instance, tactile sensor models in object state estimation during pushing.
Quote Tweet
An Energy-based Perspective on Learning Observation Models – Machine Learning Blog | ML@CMU | Carnegie Mellon University blog.ml.cmu.edu/2021/12/17/lea
44
If you are attending #NeurIPS2021, visit our poster on Wednesday, Dec 8 at 4.30pm PST! neurips.cc/virtual/2021/p
Image
Quote Tweet
Embedded video
GIF
Presenting SEAL: Self-supervised Embodied Active Learning! #NeurIPS2021 SEAL is a self-supervised framework to close the action-perception loop. It improves perception & action models by just moving in the physical world w/o any human supervision. devendrachaplot.github.io/projects/seal 1/N
Show this thread
2
20