In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. By leveraging the recent implicit neural representation techniques, particularly the appealing neural radiance fields, we introduce an object field component to learn unique codes for all individual objects in 3D space only from 2D supervision. The key to this component is a series of carefully designed loss functions to enable every 3D point, especially in non-occupied space, to be effectively optimized even without 3D labels. In addition, we introduce an inverse query algorithm to freely manipulate any specified 3D object shape in the learned scene representation. Notably, our manipulation algorithm can explicitly tackle key issues such as object collisions and visual occlusions. Our method, called DM-NeRF, is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline. Extensive experiments on three datasets clearly show that our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space such as translation, rotation, size adjustment, and deformation.
翻译:在本文中,我们从 2D 视图中研究 3D 场景几何分解和操控问题。 通过利用最近隐含的神经显示技术,特别是具有吸引力的神经光亮场,我们引入了一个对象字段组件,学习三维空间内所有单个物体的独特代码,仅从 2D 监督中学习。这一组件的关键是一系列精心设计的丢失功能,使每个三维点,特别是非占用空间的三维点,即使在没有 3D 标签的情况下,都能得到有效的优化。此外,我们还引入了反向查询算法,在所学的场景显示中自由操作任何指定的三维对象形状。 值得注意的是,我们的操纵算法可以明确解决关键的问题, 如物体碰撞和视觉隐蔽。 我们称为 DM- NERF 的方法是第一个同时重建、 拆解、 操作和 3D 在单一管道中制造复杂的三维场景。 在三个数据集上进行的广泛实验清楚地表明, 我们的方法可以准确地解析所有二维视图中的三维对象, 允许任何感兴趣的物体在三维空间中被自由操纵, 如翻译、 旋转、 大小调整和变形。