Video generation requires synthesizing consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs). First, towards composing adjacent frames, we show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality. Second, by incorporating the temporal shift module (TSM), originally designed for video understanding, into the discriminator, we manage to advance the generator in synthesizing more consistent dynamics. Third, we develop a novel B-Spline based motion representation to ensure temporal smoothness to achieve infinite-length video generation. It can go beyond the frame number used in training. A low-rank temporal modulation is also proposed to alleviate repeating contents for long video generation. We evaluate our approach on various datasets and show substantial improvements over video generation baselines. Code and models will be publicly available at https://genforce.github.io/StyleSV.
翻译:视频生成需要将具有动态内容的连贯和持续框架与时俱进。 这项工作通过使用基因对抗网络( GANs) 来调查任意长度( 从几个框架到甚至无限)制作视频的时间关系模型。 首先, 我们向相邻框架( GANs ) 进行组合。 首先, 我们显示, 单个图像生成的无别名操作, 加上充分的预学知识, 能够带来一个平稳的框架过渡, 同时又不会损害每个框架的质量。 其次, 通过将最初为视频理解设计的时移模块( TSM) 纳入导师, 我们设法将生成器的同步化为更加一致的动态。 第三, 我们开发了基于 B- Sline 的新型动作代表, 以确保时间顺畅, 实现无限的视频生成。 它可以超越培训中所使用的框架数 。 我们还提议低调时间调整, 以缓解长视频生成的重复内容 。 我们评估了我们关于各种数据集的方法, 并显示对视频生成基线的重大改进 。 代码和模型将在 https://genforce.github. / SttylearSvSV 上公开提供 。