Conversation

The precise ground truth scenes and bodies enable us to locate contact regions on the body in detail. Top: SMPL-X project into the videos. Middle: SMPL-X in 3D scene. Bottom: Green vertices on the mesh show contact. (4/8)
Embedded video
GIF
2
12
In total there are 577K images with 4K resolution. In addition to the fixed cameras, RICH contains moving cameras. The quality and complexity of RICH goes beyond existing datasets like PROX and BEHAVE, enabling research on human-scene contact. (5/8)
Embedded video
GIF
2
9
Using RICH, we train BSTRO to infer contact on the 3D body from a single image. The key observation is that contact surfaces are always occluded. To reason about them, we leverage a transformer that can attend to relevant parts of the image. (6/8)
Image
2
11
Show replies