Video generation is an interesting problem in computer vision. It is quite popular for data augmentation, special effect in move, AR/VR and so on. With the advances of deep learning, many deep generative models have been proposed to solve this task. These deep generative models provide away to utilize all the unlabeled images and videos online, since it can learn deep feature representations with unsupervised manner. These models can also generate different kinds of images, which have great value for visual application. However generating a video would be much more challenging since we need to model not only the appearances of objects in the video but also their temporal motion. In this work, we will break down any frame in the video into content and pose. We first extract the pose information from a video using a pre-trained human pose detection and use a generative model to synthesize the video based on the content code and pose code.
翻译:视频生成在计算机视觉中是一个有趣的问题。 它对于数据增强、 移动的特殊效果、 AR/ VR 等来说相当受欢迎。 随着深层次学习的进步, 许多深层次的基因模型被提出来解决这个问题。 这些深层次的基因模型提供在网上使用所有未贴标签的图像和视频, 因为它可以以不受监督的方式学习深度的特征表现。 这些模型还可以产生不同种类的图像, 这些图像对于视觉应用具有巨大的价值。 但是, 生成一个视频将更具挑战性得多, 因为我们需要建模不仅是视频中对象的外观, 而且还要建模它们的时间运动。 在这项工作中, 我们将将视频中的任何框架破碎成内容和布局。 我们首先使用预先训练过的人类外观探测方式从视频中提取外观信息, 并使用一个基因模型来根据内容代码和形状代码合成视频。