Excited to share "Visual Language Maps"! VLMaps fuse visual language model features into a dense 3D map for robot navigation from natural language instructions
Website: vlmaps.github.io
Led by the amazing w/ ,
Conversation
Replying to
VLMaps provides spatial grounding for VLMs like LSeg. Notably, when combined with code-writing LLMs, this allows navigating to spatial goals from natural language such as: "go in between the sofa and TV" or "move 3 meters to the right of the chair"
1
6
VLMaps allows "open vocabulary obstacle maps" for path planning with different robots! E.g. a drone can fly over tables, but a mobile robot may not. Both can share a VLMap of the same env, just with different object categories to index different obstacles.
1
4
We looked at comparing to CoW arxiv.org/abs/2203.10421 and LM-Nav sites.google.com/corp/view/lmnav, and we were excited to see VLMaps improve in capacity to (i) navigate to spatial goals, and (ii) handle long-horizon tasks with multiple subgoals (w/ ambiguity)
1
4
Lots of recent work in the area! in just the last month: NLMap nlmap-saycan.github.io & CLIP-Fields mahis.life/clip-fields. VLMaps is only our take on the problem, but I love that we get to explore spatial goals as a central part of the problem + open vocab obstacle maps
1
7
I think this is really a great direction to go!


