GANs are able to perform generation and manipulation tasks, trained on a single video. However, these single video GANs require unreasonable amount of time to train on a single video, rendering them almost impractical. In this paper we question the necessity of a GAN for generation from a single video, and introduce a non-parametric baseline for a variety of generation and manipulation tasks. We revive classical space-time patches-nearest-neighbors approaches and adapt them to a scalable unconditional generative model, without any learning. This simple baseline surprisingly outperforms single-video GANs in visual quality and realism (confirmed by quantitative and qualitative evaluations), and is disproportionately faster (runtime reduced from several days to seconds). Other than diverse video generation, we demonstrate other applications using the same framework, including video analogies and spatio-temporal retargeting. Our proposed approach is easily scaled to Full-HD videos. These observations show that the classical approaches, if adapted correctly, significantly outperform heavy deep learning machinery for these tasks. This sets a new baseline for single-video generation and manipulation tasks, and no less important -- makes diverse generation from a single video practically possible for the first time.
翻译:然而,这些单一的视频GAN要求用不合理的时间来训练单一视频,使其几乎不切实际。在本文中,我们质疑从单一视频产生GAN的必要性,并为各种生成和操作任务引入非参数基线。我们恢复了古典的时空隔间隔近邻方法,使其适应可扩展的无条件基因化模式,而没有任何学习。这种简单的基线惊人地超过了视觉质量和现实主义方面的单一视频GAN(通过定量和定性评估加以证实),而且速度过快(时间从几天缩短到几秒钟)。除了不同的视频生成之外,我们展示了使用同一框架的其他应用,包括视频模拟和微调时钟重新定位。我们提出的方法很容易推广到全HD视频。这些观察显示,这些古典方法,如果修改正确,大大优于这些任务所需的重深层次学习机制。这为第一代的单一视频生成和操纵任务设定了新的基线,而实际上并不重要,从单一时间开始,从一个可能的单一视频一代开始。