Inspired by the well-known predictive coding theory in cognitive science, we propose a novel neural network model for the task of visual-frame prediction. In this paper, our main work is to combine the theoretical framework of predictive coding and deep learning architectures, to design an efficient predictive network model for visual-frame prediction. The model is composed of a series of recurrent and convolutional units forming the top-down and bottom-up streams, respectively. It learns to predict future frames in a visual sequence, with ConvLSTMs on each layer in the network making local prediction from top to down. The main innovation of our model is that the update frequency of neural units on each of the layer decreases with the increasing of network levels, which results in the model appears like a pyramid from the perspective of time dimension, so we call it the Pyramid Predictive Network (PPNet). Particularly, this pyramid-like design is consistent to the neuronal activities in the neuroscience findings involved in the predictive coding framework. According to the experimental results, this model shows better compactness and comparable predictive performance with existing works, implying lower computational cost and higher prediction accuracy. Code will be available at https://github.com/Ling-CF/PPNet.
翻译:在认知科学中众所周知的预测编码理论的启发下,我们提出一个新的神经网络模型,用于视觉框架预测任务。在本文中,我们的主要工作是将预测编码和深学习结构的理论框架结合起来,设计一个高效的视觉框架预测网络模型。模型由一系列由自上而下和自下而上流组成的经常性和革命性单元组成。它学会以视觉序列预测未来框架,网络的每个层都有CONLSTMs,从上到下进行本地预测。我们模型的主要创新是将预测编码编码和深学习结构的理论框架结合起来,以设计一个高效的预测网络模型,以设计一个高效的预测网络模型,因此我们称之为Pyramid 预测网络。特别是,这种金字塔式的设计与预测性神经科学发现中的神经活动是一致的。根据实验结果,这个模型将显示每个层的神经系统的更新频率和可比较的预测性能与现有工程相比更加紧凑和可比较的预测性,这意味着模型的结果从时间层面看似金字塔形,因此我们称之为PM/GPPM/Q。