Check out our new work on BEV perception for driving.
We simplify existing methods, and try to show "what really matters" for top performance.
TL;DR: maximize batch size, use high-res images, and integrate radar, and you can beat the SOTA by 10 points.
simple-bev.github.io
Conversation
Replying to
There has been a lot of innovation in 2D-to-BEV "lifting" methods (MLPs, depth-based splatting, transformers, etc.), but we find that a parameter-free method is still very competitive: for each 3D coordinate, we simply take a bilinear sample at the image features it projects to.
1
1
5
Even with cameras alone, our method outperforms all existing work. Not because we invented anything new, but because we paid attention to details: batch size, resolution, backbone, and augmentations.
1
6
We found prior work reporting that radar (esp. in nuScenes) is too sparse to be useful. We were skeptical -- radar gives a metric map of the scene, which is very difficult to get from non-overlapping cameras. We show camera+radar fusion is indeed better than using cameras alone.
9

