Image and video synthesis are closely related areas aiming at generating content from noise. While rapid progress has been demonstrated in improving image-based models to handle large resolutions, high-quality renderings, and wide variations in image content, achieving comparable video generation results remains problematic. We present a framework that leverages contemporary image generators to render high-resolution videos. We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator. Not only does such a framework render high-resolution videos, but it also is an order of magnitude more computationally efficient. We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled. With such a representation, our framework allows for a broad range of applications, including content and motion manipulation. Furthermore, we introduce a new task, which we call cross-domain video synthesis, in which the image and motion generators are trained on disjoint datasets belonging to different domains. This allows for generating moving objects for which the desired video data is not available. Extensive experiments on various datasets demonstrate the advantages of our methods over existing video generation techniques. Code will be released at https://github.com/snap-research/MoCoGAN-HD.
翻译:图像和视频合成是旨在产生噪音内容的密切相关领域。虽然在改进图像模型以处理大型分辨率、高质量图像和图像内容的广度差异方面展示了快速进展,但在改进图像模型以处理大型分辨率、高质量图像和图像内容方面显示出了快速进展,但具有可比性的视频生成结果仍成问题。我们提出了一个框架,利用当代图像生成器来制作高分辨率视频。我们把视频合成问题设置为在预先培训和固定图像生成器的潜在空间中发现轨迹。这种框架不仅使高分辨率视频成为高分辨率视频,而且还提高了计算效率。我们引入了一种能发现所需轨迹的运动生成器,该轨迹中的内容和运动被分解。有了这样的表述,我们的框架允许广泛应用,包括内容和动作操纵。此外,我们引入了一项新任务,我们称之为跨面视频合成器,对图像和动作生成器进行了属于不同域的脱节式数据集的培训。这样可以生成没有理想视频数据的移动物体。关于各种数据集的广泛实验展示了我们方法相对于现有视频生成技术的优势。代码将在 https/HD/Ambs.COssssearch。 将公布在 http/Co/Coresearcham-G。