We address the challenge of recovering an underlying scene geometry and colors from a sparse set of RGBD view observations. In this work, we present a new solution termed RGBD$^2$ that sequentially generates novel RGBD views along a camera trajectory, and the scene geometry is simply the fusion result of these views. More specifically, we maintain an intermediate surface mesh used for rendering new RGBD views, which subsequently becomes complete by an inpainting network; each rendered RGBD view is later back-projected as a partial surface and is supplemented into the intermediate mesh. The use of intermediate mesh and camera projection helps solve the tough problem of multi-view inconsistency. We practically implement the RGBD inpainting network as a versatile RGBD diffusion model, which is previously used for 2D generative modeling; we make a modification to its reverse diffusion process to enable our use. We evaluate our approach on the task of 3D scene synthesis from sparse RGBD inputs; extensive experiments on the ScanNet dataset demonstrate the superiority of our approach over existing ones. Project page: https://jblei.site/proj/rgbd-diffusion.
翻译:本文致力于解决从稀疏的RGBD视角观测恢复场景几何和颜色的挑战。我们提出一种新的解决方案,称为RGBD$^2$,通过沿着摄像机轨迹递增生成新的RGBD视角,而场景几何则是这些视角的融合结果。具体来说,我们维护一个用于渲染新RGBD视角的中间表面网格,并通过一个修复网络进行修复,每个渲染的RGBD视角后来被反投影为部分表面,并被补充到中间网格中。中间网格和相机投影的使用有助于解决多视角不一致性的问题。我们将RGBD修复网络实现为一种多功能的RGBD扩散模型,该模型先前用于2D生成建模。通过对其反向扩散进程进行修改,我们使其能够适用于我们的方法。我们在ScanNet数据集上评估了我们的方法在从稀疏RGBD输入合成3D场景方面的性能;广泛的实验表明,我们的方法优于现有方法。项目页面:https://jblei.site/proj/rgbd-diffusion。