We, as human beings, can understand and picture a familiar scene from arbitrary viewpoints given a single image, whereas this is still a grand challenge for computers. We hereby present a novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene. Specifically, we first learn the prior knowledge of the objects in a closed scene via an offline stage, which facilitates an online stage to understand the room with unseen furniture arrangement. During the online stage, given a panoramic image of the scene in different layouts, we utilize a holistic neural-rendering-based optimization framework to efficiently estimate the correct 3D scene layout and deliver realistic free-viewpoint rendering. In order to handle the domain gap between the offline and online stage, our method exploits compositional neural rendering techniques for data augmentation in the offline training. The experiments on both synthetic and real datasets demonstrate that our two-stage design achieves robust 3D scene understanding and outperforms competing methods by a large margin, and we also show that our realistic free-viewpoint rendering enables various applications, including scene touring and editing. Code and data are available on the project webpage: https://zju3dv.github.io/nr_in_a_room/.
翻译:作为人类,我们可以从任意的角度来理解和描绘一个熟悉的场景,给一个图像,而对于计算机来说,这仍然是一个巨大的挑战。我们特此提出一个新的解决方案,根据现代的 3D 场景理解与封闭场景的神经转换的新模式,模仿人类的感知能力。具体地说,我们首先通过离线阶段来了解封闭场景中的物体的先前知识,这有利于通过隐蔽家具安排来理解房间的在线阶段。在网上阶段,考虑到在不同布局中场景的全景图像,我们利用一个基于神经的全局优化框架来有效地估计正确的 3D 场场景布局并提供现实的免费视图显示。为了处理离线阶段和在线阶段之间的域间差距,我们的方法在离线培训中利用了合成神经转换技术来增强数据。合成和真实数据集的实验表明,我们的两阶段设计实现了强健的 3D 场景理解,并超越了大边缘的竞合方法,我们还展示了我们现实的自由视野,使各种应用程序得以应用,包括场景_surin_jun codeal and proom and the sideal proom: sal_dal_bin/dalpald.