用于大型视频预报的贪婪等级结构变化式自动调整器 (Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction)

A video prediction model that generalizes to diverse scenes would enable intelligent agents such as robots to perform a variety of tasks via planning with the model. However, while existing video prediction models have produced promising results on small datasets, they suffer from severe underfitting when trained on large and diverse datasets. To address this underfitting challenge, we first observe that the ability to train larger video prediction models is often bottlenecked by the memory constraints of GPUs or TPUs. In parallel, deep hierarchical latent variable models can produce higher quality predictions by capturing the multi-level stochasticity of future observations, but end-to-end optimization of such models is notably difficult. Our key insight is that greedy and modular optimization of hierarchical autoencoders can simultaneously address both the memory constraints and the optimization challenges of large-scale video prediction. We introduce Greedy Hierarchical Variational Autoencoders (GHVAEs), a method that learns high-fidelity video predictions by greedily training each level of a hierarchical autoencoder. In comparison to state-of-the-art models, GHVAEs provide 17-55% gains in prediction performance on four video datasets, a 35-40% higher success rate on real robot tasks, and can improve performance monotonically by simply adding more modules.

翻译：将视频预测模型推广到不同的场景,将使机器人等智能剂通过与模型进行规划来完成各种任务。然而,虽然现有的视频预测模型在小型数据集上产生了有希望的结果,但是,在对大型和多样化数据集进行训练时,这些模型严重不足。为了应对这一不足的挑战,我们首先观察到,培训大型视频预测模型的能力往往由于GPU或TPU的记忆限制而受到瓶颈。同时,深层次潜潜伏变量模型可以通过获取未来观测的多层次随机性来产生质量更高的预测,但这类模型的最终到终端优化尤其困难。我们的主要见解是,等级自动计算机的贪婪和模块优化可以同时解决记忆限制和大规模视频预测的优化挑战。我们引入了Greedy 高度结构自动自动显示器(GHVAE),这种方法通过贪婪式培训每个等级的等级自动解析器进行高密度视频预测,可以产生更高质量的预测,但与州级40模型相比,此类模型的最终到终端优化则特别困难。我们的主要见解是,等级的模块的贪婪和模块可以同时解决大型视频的记忆限制和优化,35VAE的成绩。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

专知会员服务

39+阅读 · 2020年11月3日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University