The pyramidal predictive network (PPNV1) proposes an interesting temporal pyramid architecture and yields promising results on the task of future video-frame prediction. We expose and analyze its signal dissemination and characteristic artifacts, and propose corresponding improvements in model architecture and training strategies to address them. Although the PPNV1 theoretically mimics the workings of human brain, its careless signal processing leads to aliasing in the network. We redesign the network architecture to solve the problems. In addition to improving the unreasonable information dissemination, the new architecture also aims to solve the aliasing in neural networks. Different inputs are no longer simply concatenated, and the downsampling and upsampling components have also been redesigned to ensure that the network can more easily construct images from Fourier features of low-frequency inputs. Finally, we further improve the training strategies, to alleviate the problem of input inconsistency during training and testing. Overall, the improved model is more interpretable, stronger, and the quality of its predictions is better. Code is available at https://github.com/Ling-CF/PPNV2.
翻译:金字塔预测网络(PPNV1)提出一个有趣的时间金字塔结构,并在未来的视频框架预测任务上产生有希望的成果。我们揭露和分析其信号传播和特质文物,并提议对模型结构和培训战略进行相应的改进。虽然PPNV1在理论上模仿了人类大脑的运行,但其粗略的信号处理导致网络内别名。我们重新设计了网络结构以解决问题。除了改进不合理的信息传播外,新结构还旨在解决神经网络中的别名。不同的投入不再简单地相互融合,还重新设计了降压和升级的部件,以确保网络能够更容易地从低频率投入的Fourier特征中建立图像。最后,我们进一步改进培训战略,以缓解培训和测试过程中输入不一致的问题。总体而言,改进后的模型更易解,更强,其预测的质量也更高。代码可在https://github.com/Ling-CF/PPNV2查阅。