Generating static novel views from an already captured image is a hard task in computer vision and graphics, in particular when the single input image has dynamic parts such as persons or moving objects. In this paper, we tackle this problem by proposing a new framework, called CycleMPI, that is capable of learning a multiplane image representation from single images through a cyclic training strategy for self-supervision. Our framework does not require stereo data for training, therefore it can be trained with massive visual data from the Internet, resulting in a better generalization capability even for very challenging cases. Although our method does not require stereo data for supervision, it reaches results on stereo datasets comparable to the state of the art in a zero-shot scenario. We evaluated our method on RealEstate10K and Mannequin Challenge datasets for view synthesis and presented qualitative results on Places II dataset.
翻译:从已经拍摄的图像中生成静态的新观点是计算机视觉和图形中的一项艰巨任务,特别是当单个输入图像有个人或移动对象等动态部分时。在本文件中,我们通过提出一个新的框架,即CycroMPI来解决这一问题,这个框架能够通过自我监督的循环培训战略,从单一图像中学习多平板图像的表示方式。我们的框架不需要立体数据来进行培训,因此,可以通过因特网的大量视觉数据来培训它,即使对于非常具有挑战性的案例,它也能产生更好的概括能力。虽然我们的方法不需要立体数据来进行监督,但它在零光景情景中可以取得与艺术状态相当的立体数据集的结果。我们评估了我们的RealEstate10K和Manequiquin 挑战数据集,用于查看合成,并在第二页数据集上提出定性结果。