We present a simple yet powerful neural network that implicitly represents and renders 3D objects and scenes only from 2D observations. The network models 3D geometries as a general radiance field, which takes a set of 2D images with camera poses and intrinsics as input, constructs an internal representation for each point of the 3D space, and then renders the corresponding appearance and geometry of that point viewed from an arbitrary position. The key to our approach is to learn local features for each pixel in 2D images and to then project these features to 3D points, thus yielding general and rich point representations. We additionally integrate an attention mechanism to aggregate pixel features from multiple 2D views, such that visual occlusions are implicitly taken into account. Extensive experiments demonstrate that our method can generate high-quality and realistic novel views for novel objects, unseen categories and challenging real-world scenes.
翻译:我们展示了一个简单而强大的神经网络,它暗含地代表并仅从 2D 观测中将3D 对象和场景变为3D 对象和场景。网络模型 3D 的地理特征作为一般的弧度场,将一组带有相机的2D 图像和内含作为输入,为3D 空间的每个点建立内部代表,然后从任意位置上对这个点的相应外观和几何进行观察。我们的方法的关键是学习2D 图像中每个像素的本地特征,然后将这些特征投射到 3D 点,从而产生一般和丰富的点表示。我们进一步整合了从多个 2D 视图中合成像素特征的注意机制,例如视觉隔离被暗中考虑。广泛的实验表明,我们的方法能够为新事物、 看不见的类别和充满挑战的现实世界的场景产生高质量和现实的新观点。