Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches.
翻译:从单个图像中合成新视角一直是许多虚拟现实应用的基础问题,它提供了沉浸式体验。然而,大多数现有的技术只能在有限的相机运动范围内合成新视角,或在相机移动较大时无法生成一致且高质量的新视角。在本篇文章中,我们提出了一种姿态指导扩散模型,以从单个图像中生成连续的新视角视频。我们设计了一个注意力层,使用极线作为限制条件,以促进不同视点之间的关联。在合成数据集和真实数据集上的实验结果证明了所提出的扩散模型的有效性,该模型优于现有的基于Transformer和GAN的方法。