Video generation has achieved rapid progress benefiting from high-quality renderings provided by powerful image generators. We regard the video synthesis task as generating a sequence of images sharing the same contents but varying in motions. However, most previous video synthesis frameworks based on pre-trained image generators treat content and motion generation separately, leading to unrealistic generated videos. Therefore, we design a novel framework to build the motion space, aiming to achieve content consistency and fast convergence for video generation. We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators. Firstly, we propose an image pair generator named MotionStyleGAN to generate image pairs sharing the same contents and producing various motions. Then we manage to acquire motion codes to edit one image in the generated image pairs and keep the other unchanged. The motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs. Finally, we introduce a latent code generator to produce latent code sequences using motion codes for video generation. Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.
翻译:从强大的图像生成器提供的高质量图像中,我们把视频合成任务视为产生一系列图像,内容相同,但动作各异。然而,大多数以前基于预先训练的图像生成器的视频合成框架分别处理内容和动作生成,从而产生不切实际的视频。因此,我们设计了一个新颖的框架来构建运动空间,目的是实现内容的一致性和视频生成的快速趋同。我们展示了动态VideoGAN,这是一部新型视频生成器,根据预先训练的图像生成器所学的动态空间合成视频。首先,我们提议建立一个名为 MotionStyleGAN的图像配对生成器,以生成共享内容的图像配对并产生各种动作。然后我们设法获得运动代码来编辑生成的图像配对中的一张图像,并保持其他图像不变。由于编辑后的图像与图像配对中的其他图像相同内容共享。最后,我们引入了一种潜在代码生成器,以便利用视频生成的动作代码生成潜在代码。我们的方法是在最复杂的视频生成中实现最复杂的U-101级的视频生成中,我们使用的U-101状态的图像生成过程。</s>