HARP: 带有高功能图像生成器的自动递减式中流视频预报 (HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator)

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and introduce additional techniques of top-k sampling and data augmentation to further improve video prediction quality. Despite the simplicity, the proposed method achieves competitive performance to state-of-the-art approaches on standard video prediction benchmarks with fewer parameters, and enables high-resolution video prediction on complex and large-scale datasets. Videos are available at https://sites.google.com/view/harp-videos/home.

翻译：视频预测是一个重要但富有挑战性的问题;承担着生成未来框架和学习环境动态的任务。最近,自动递减潜潜潜视频模型被证明是一个强大的视频预测工具,将视频预测分为两个子问题:先训练图像生成模型,然后在图像生成器的潜空学习自动递减预测模型。然而,尚未看到成功生成高不忠和高分辨率视频。在这项工作中,我们研究如何培训一个自动递减潜潜在视频预测模型,该模型能够对现有模型进行最低限度的修改,预测高纤维未来框架,并制作高分辨率(256x256)视频。具体地说,我们通过使用高纤维生成模型(VQ-GAN)和因果变异模型扩大先前的模型,并采用顶级取样和数据增强的附加技术,以进一步提高视频预测质量。尽管简单,但拟议方法在标准视频预测基准参数较小的情况下,在州级视频预测方法上取得了竞争性的性能,并且能够使高分辨率视频预测在复杂和大型数据中进行(MAG/ShoVs)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日