Neural rendering has received tremendous attention since the advent of Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on novel-view synthesis considerably. The recent focus has been on models that overfit to a single scene, and the few attempts to learn models that can synthesize novel views of unseen scenes mostly consist of combining deep convolutional features with a NeRF-like model. We propose a different paradigm, where no deep features and no NeRF-like volume rendering are needed. Our method is capable of predicting the color of a target ray in a novel scene directly, just from a collection of patches sampled from the scene. We first leverage epipolar geometry to extract patches along the epipolar lines of each reference view. Each patch is linearly projected into a 1D feature vector and a sequence of transformers process the collection. For positional encoding, we parameterize rays as in a light field representation, with the crucial difference that the coordinates are canonicalized with respect to the target ray, which makes our method independent of the reference frame and improves generalization. We show that our approach outperforms the state-of-the-art on novel view synthesis of unseen scenes even when being trained with considerably less data than prior work.
翻译:自神经辐射场(NERF)出现以来,神经变异一直受到极大关注,并大大推进了新观点合成的最新工艺。最近的重点一直是那些适合单一场景的模型,以及很少尝试学习能够综合对看不见场景的新观点的模型,这些模型大多包括将深相共变异特征与类似NERF的模型结合起来。我们提出了一个不同的范例,其中不需要深度特征和类似NERF的体积。我们的方法能够直接预测新颖场景中目标射线的颜色,仅仅从现场抽样的补丁中可以直接预测。我们首先利用偶极地几何测量法来提取每个参照场景的近极线上的补丁。每个补丁都被线直线地投射成一个1D特性矢量和一系列变异器的集过程。关于定位编码,我们将射线标定成一个浅色的外观,而坐标与目标射线有关键区别,它使我们的方法与参考框独立,甚至改进了一般化。我们用新式几近的合成图像展示了我们之前的近似图像。