GANs are able to perform generation and manipulation tasks, trained on a single video. However, these single video GANs require unreasonable amount of time to train on a single video, rendering them almost impractical. In this paper we question the necessity of a GAN for generation from a single video, and introduce a non-parametric baseline for a variety of generation and manipulation tasks. We revive classical space-time patches-nearest-neighbors approaches and adapt them to a scalable unconditional generative model, without any learning. This simple baseline surprisingly outperforms single-video GANs in visual quality and realism (confirmed by quantitative and qualitative evaluations), and is disproportionately faster (runtime reduced from several days to seconds). Our approach is easily scaled to Full-HD videos. We also use the same framework to demonstrate video analogies and spatio-temporal retargeting. These observations show that classical approaches significantly outperform heavy deep learning machinery for these tasks. This sets a new baseline for single-video generation and manipulation tasks, and no less important -- makes diverse generation from a single video practically possible for the first time.
翻译:然而,这些单一的视频GAN要求用不合理的时间来训练单一视频,使其几乎不切实际。在本文中,我们质疑从单一视频产生GAN的必要性,并为各种生成和操作任务引入非参数基线。我们恢复了古典的时空补丁-近邻办法,使其适应一个可扩展的无条件基因化模式,而没有任何学习。这一简单基线出乎意料地超越了视觉质量和现实主义方面的单一视频GAN(通过定量和定性评估加以证实),而且速度过快(时间从几天缩短到几秒钟),我们的方法很容易被放大为全HD视频。我们还使用同样的框架来展示视频模拟和时空重定向。这些观察显示,古典方法大大优于这些任务的重深层次学习机制。这为单视频生成和操作任务规定了一个新的基线,其重要性也并不小 -- 第一次从单一视频产生多样化。