MessyKitchens: Contact-rich Object-level

3D Scene Reconstructions

Junaid Ahmed Ansari1*Ran Ding1*Fabio Pizzati1Ivan Laptev1

1 Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
* Equal contribution
Paper Code ยท Coming Soon
teaser

Abstract


Monocular 3D scene reconstruction has recently seen significant progress. Powered by modern neural architectures and large-scale data, recent methods achieve high performance in depth estimation from a single image. Meanwhile, reconstructing and decomposing common scenes into individual 3D objects remains a hard challenge due to the large variety of objects, frequent occlusions and complex object relations. Notably, beyond shape and pose estimation of individual objects, applications in robotics and animation require physically-plausible scene reconstruction where objects obey physical principles of non-penetration and realistic contacts. In this work we advance object-level scene reconstruction along two directions. First, we introduce MessyKitchens, a new dataset with real-world scenes featuring cluttered environments and providing high-fidelity object-level ground truth in terms of 3D object shapes, poses and accurate object contacts. Second, we build on the recent SAM 3D approach for single-object reconstruction and extend it with Multi-Object Decoder (MOD) for joint object-level scene reconstruction. To validate our contributions, we demonstrate MessyKitchens to significantly improve previous datasets in registration accuracy and inter-object penetration. We also compare our multi-object reconstruction approach on three datasets and demonstrate consistent and significant improvements of MOD over the state of the art.

Dataset


MessyKitchens contains cluttered kitchen scenes with rich contacts, stacking, and nesting patterns. Select any example below to switch the annotation image and its paired video together.

Image
MessyKitchens real-world annotation example
Video

MOD vs SAM 3D


Select any example below to switch the input image and the paired MOD refinement video together.

Image
Input image for MOD vs SAM 3D demo 1
Video

© This webpage was in part inspired from this template.