Given the three dimensional complexity of a video signal, training a robust and diverse GAN based video generative model is onerous due to large stochasticity involved in data space. Learning disentangled representations of the data help to improve robustness and provide control in the sampling process. For video generation, there is a recent progress in this area by considering motion and appearance as orthogonal information and designing architectures that efficiently disentangle them. These approaches rely on handcrafting architectures that impose structural priors on the generator to decompose appearance and motion codes in the latent space. Inspired from the recent advancements in the autoencoder based image generation, we present AVLAE (Adversarial Video Latent AutoEncoder) which is a two stream latent autoencoder where the video distribution is learned by adversarial training. In particular, we propose to autoencode the motion and appearance latent vectors of the video generator in the adversarial setting. We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator. Several experiments with qualitative and quantitative results demonstrate the effectiveness of our method.
翻译:鉴于视频信号的三维复杂性,培训一个强大和多样化的基于GAN的视频基因模型十分繁琐,因为数据空间涉及巨大的随机性。学习数据分解的表达方式有助于增强稳健性和在取样过程中提供控制。对于视频生成来说,这一领域最近有了进展,将运动和外观视为正统信息,并设计能有效解析它们的结构设计。这些方法依靠手工制作结构结构结构结构,使发电机在潜空中拆解外观和运动代码。根据基于自动电解器图像生成的最新进展,我们介绍了AVLAE(Adversarial Video Lent AutoCoder),这是两条潜在的自动编码,通过对抗性培训学习了视频传播。特别是,我们提议在对称环境中自动编码视频生成器的动作和表面潜伏矢量矢量。我们证明,我们的方法学会了将动作和外观代码分解,即使没有在生成器中明确的结构构成。一些具有定性和定量结果的实验展示了我们的方法的有效性。