Visual-frame prediction is a pixel-dense prediction task that infers future frames from past frames. Lacking of appearance details, low prediction accuracy and high computational overhead are still major problems with current models or methods. In this paper, we propose a novel neural network model inspired by the well-known predictive coding theory to deal with the problems. Predictive coding provides an interesting and reliable computational framework, which will be combined with other theories such as the cerebral cortex at different level oscillates at different frequencies, to design an efficient and reliable predictive network model for visual-frame prediction. Specifically, the model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams, respectively. The update frequency of neural units on each of the layer decreases with the increasing of network levels, which results in neurons of higher-level can capture information in longer time dimensions. According to the experimental results, this model shows better compactness and comparable predictive performance with existing works, implying lower computational cost and higher prediction accuracy. Code is available at https://github.com/Ling-CF/PPNet.
翻译:视觉框架预测是一项从过去框架推断出未来框架的像素、严谨的预测任务。缺乏外观细节、低预测准确度和高计算间接费用仍然是当前模式或方法的主要问题。在本文中,我们提出一个由众所周知的预测编码理论启发的新型神经网络模型,以应对这些问题。预测编码提供了一个有趣和可靠的计算框架,它将与其他理论如不同频率不同层次的大脑皮层振荡等理论相结合,以设计一个高效和可靠的视觉框架预测网络模型。具体地说,该模型由一系列经常性和革命性单元组成,分别构成自上而下和自下而上的流。随着网络水平的上升,每个层层神经元的更新频率将随着网络水平的上升而下降,从而导致较高层次的神经元可以在更长的时间内收集信息。根据实验结果,该模型显示与现有工程相比更加紧凑和可比较的预测性,这意味着较低的计算成本和更高的预测准确性。代码可以在 https://githbub.com/Ling-CFPPP/Net上查阅。