Spatiotemporal predictive learning (ST-PL) is a hotspot with numerous applications, such as object movement and meteorological prediction. It aims at predicting the subsequent frames via observed sequences. However, inherent uncertainty among consecutive frames exacerbates the difficulty in long-term prediction. To tackle the increasing ambiguity during forecasting, we design CMS-LSTM to focus on context correlations and multi-scale spatiotemporal flow with details on fine-grained locals, containing two elaborate designed blocks: Context Embedding (CE) and Spatiotemporal Expression (SE) blocks. CE is designed for abundant context interactions, while SE focuses on multi-scale spatiotemporal expression in hidden states. The newly introduced blocks also facilitate other spatiotemporal models (e.g., PredRNN, SA-ConvLSTM) to produce representative implicit features for ST-PL and improve prediction quality. Qualitative and quantitative experiments demonstrate the effectiveness and flexibility of our proposed method. With fewer params, CMS-LSTM outperforms state-of-the-art methods in numbers of metrics on two representative benchmarks and scenarios.
翻译:外观预测性学习(ST-PL)是一个热点,有多种应用,例如物体移动和气象预测,目的是通过观测序列预测随后的框架。但是,连续框架的内在不确定性加剧了长期预测的困难。为了解决预测期间日益模糊的问题,我们设计CMS-LSTM, 重点是环境相关性和多尺度的跨时流,详细介绍细微的本地人,包括两个精心设计的块:背景嵌入(CE)和斯帕蒂时表达(SE)块。 CE是设计用于大量环境互动的,而SE则侧重于隐蔽状态的多尺度波时表达。新推出的区块还促进其他波时模型(例如PredRNN、SA-CONLSTM),为ST-PLP产生具有代表性的隐含特征,并提高预测质量。定性和定量实验显示了我们拟议方法的有效性和灵活性。由于在两个具有代表性的基准和情景的测量数中,CMS-LSTM在数量上低于最新状态方法。