Video prediction models often combine three components: an encoder from pixel space to a small latent space, a latent space prediction model, and a generative model back to pixel space. However, the large and unpredictable pixel space makes training such models difficult, requiring many training examples. We argue that finding a predictive latent variable and using it to evaluate the consistency of a future image enables data-efficient predictions because it precludes the necessity of a generative model training. To demonstrate it, we created sequence completion intelligence tests in which the task is to identify a predictably changing feature in a sequence of images and use this prediction to select the subsequent image. We show that a one-dimensional Markov Contrastive Predictive Coding (M-CPC_1D) model solves these tests efficiently, with only five examples. Finally, we demonstrate the usefulness of M-CPC_1D in solving two tasks without prior training: anomaly detection and stochastic movement video prediction.
翻译:视频预测模型通常将三个组成部分结合起来:从像素空间到小型潜伏空间的编码器、潜伏空间预测模型和回像素空间的基因模型。然而,巨大的和不可预测的像素空间使得培训这些模型十分困难,需要许多培训实例。我们争辩说,找到一个预测潜在变量并利用它来评价未来图像的一致性,就能够进行数据效率预测,因为它排除了基因化模型培训的必要性。为了证明这一点,我们创建了序列完成情报测试,任务是在图像序列中确定可预测的变化特征,并利用这一预测来选择随后的图像。我们证明,单维的Markov Contrastical预测编码模型(M-CPC_1D)能够有效地解决这些测试,只有五个例子。最后,我们展示了M-CPC_1D在不事先培训的情况下解决两项任务:异常探测和随机运动视频预测的有用性。