We present a simple yet powerful implicit neural function that can represent and render arbitrarily complex 3D scenes in a single network only from 2D observations. The function models 3D scenes as a general radiance field, which takes a set of posed 2D images with camera poses and intrinsics as input, constructs an internal representation for each 3D point of the scene, and renders the corresponding appearance and geometry of any 3D point viewing from an arbitrary angle. The key to our approach is to explicitly integrate the principle of multi-view geometry to obtain the internal representations from observed 2D views, such that the learned implicit representations empirically remain multi-view consistent. In addition, we introduce an effective neural module to learn general features for each pixel in 2D images, allowing the constructed internal 3D representations to be general as well. Extensive experiments demonstrate the superiority of our approach.
翻译:我们展示了一个简单而强大的隐含神经功能,它只能从 2D 观测的单一网络中代表并造成任意复杂的 3D 场景。3D 场景作为一般的光亮场,将一组带有摄像头的2D 图像和内含作为输入的内含,为场景的每个 3D 点建立内部代表制,并从任意的角度将任何三维点的对应外观和几何转换成。我们方法的关键是明确整合多视角几何原则,以便从观察到的 2D 视图中获取内部代表,这样,从经验上学到的隐含的表达方式就保持多视角的一致性。此外,我们引入了一个有效的神经模块,以学习2D 图像中每个像素的一般特征,使构建的内部 3D 代表制成的外观既具有一般性,也具有广泛的实验性。