We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view, where this capability is learned from a collection of single photographs, without requiring camera poses or even multiple views of each scene. To achieve this, we propose a novel self-supervised view generation training paradigm, where we sample and rendering virtual camera trajectories, including cyclic ones, allowing our model to learn stable view generation from a collection of single views. At test time, despite never seeing a video during training, our approach can take a single image and generate long camera trajectories comprised of hundreds of new views with realistic and diverse content. We compare our approach with recent state-of-the-art supervised view generation methods that require posed multi-view videos and demonstrate superior performance and synthesis quality.
翻译:我们提出一种学习方法,从单一角度来生成自然场景的无限制的飞翔视频,这种能力是从收集单一照片中学习的,而不需要照相机或对每个场景的多重观点。 为了实现这一目标,我们提议了一种新的自我监督的视觉生成培训模式,我们在该模式中抽样和制作虚拟相机轨迹,包括自行车轨迹,让我们的模型能够从单个观点的集合中学习稳定的视觉生成。 在测试时,尽管在培训期间从未看到过任何视频,但我们的方法可以采取单一图像,产生由数百种新观点组成的长镜头轨迹,其中含有现实和多样的内容。我们比较了我们的方法和最新最先进的、最先进的、最有监督的视觉生成方法,这些方法需要制作多视图视频,并展示出优异的性能和合成质量。