We address the problem of novel view synthesis (NVS) from a few sparse source view images. Conventional image-based rendering methods estimate scene geometry and synthesize novel views in two separate steps. However, erroneous geometry estimation will decrease NVS performance as view synthesis highly depends on the quality of estimated scene geometry. In this paper, we propose an end-to-end NVS framework to eliminate the error propagation issue. To be specific, we construct a volume under the target view and design a source-view visibility estimation (SVE) module to determine the visibility of the target-view voxels in each source view. Next, we aggregate the visibility of all source views to achieve a consensus volume. Each voxel in the consensus volume indicates a surface existence probability. Then, we present a soft ray-casting (SRC) mechanism to find the most front surface in the target view (i.e. depth). Specifically, our SRC traverses the consensus volume along viewing rays and then estimates a depth probability distribution. We then warp and aggregate source view pixels to synthesize a novel view based on the estimated source-view visibility and target-view depth. At last, our network is trained in an end-to-end self-supervised fashion, thus significantly alleviating error accumulation in view synthesis. Experimental results demonstrate that our method generates novel views in higher quality compared to the state-of-the-art.
翻译:我们从少数来源视图图像中处理新颖的视图合成问题。 常规图像生成方法对场景几何进行估计, 并将新观点合成为两个不同的步骤。 然而, 错误的几何估计会降低 NVS 的性能, 因为视图合成很大程度上取决于估计场景几何测量的质量。 在本文中, 我们提议了一个端到端的 NVS 框架, 以消除错误传播问题。 具体地说, 我们在目标视图下建立一个卷, 并设计一个源视图可见度估计模块( SVE), 以确定每个源视图中目标视图的可见度。 下一步, 我们汇总所有源视图的可见度, 以达到共识的体积数量。 共识卷中的每个 voxel 表示表面存在的可能性。 然后, 我们提出一个软的光谱显示机制, 在目标视图中找到最前端的表面( 深度 ) 。 具体地说, 我们的SRC 沿着光谱浏览的一致的体积, 然后估计深度的概率分布。 我们随后将最后的源视图 和总源视图 综合一个基于估计源视图质量的视觉的新观点,, 将我们经过深的自我分析 的模型 。