Large-scale and multidimensional spatiotemporal data sets are becoming ubiquitous in many real-world applications such as monitoring urban traffic and air quality. Making predictions on these time series has become a critical challenge due to not only the large-scale and high-dimensional nature but also the considerable amount of missing data. In this paper, we propose a Bayesian temporal factorization (BTF) framework for modeling multidimensional time series -- in particular spatiotemporal data -- in the presence of missing values. By integrating low-rank matrix/tensor factorization and vector autoregressive (VAR) process into a single probabilistic graphical model, this framework can characterize both global and local consistencies in large-scale time series data. The graphical model allows us to effectively perform probabilistic predictions and produce uncertainty estimates without imputing those missing values. We develop efficient Gibbs sampling algorithms for model inference and model updating for real-time prediction and test the proposed BTF framework on several real-world spatiotemporal data sets for both missing data imputation and multi-step rolling prediction tasks. The numerical experiments demonstrate the superiority of the proposed BTF approaches over existing state-of-the-art methods.
翻译:在监测城市交通和空气质量等许多现实世界应用中,大规模和多层面的大规模时空数据集正在变得无处不在,监测城市交通和空气质量等许多现实应用中。预测这些时间序列不仅由于大规模和高度的性质,而且由于大量缺失的数据,已成为一项重大挑战。在本文件中,我们提议为在缺少值的情况下建模多维时间序列(特别是时空数据)模型而建立一个巴伊西亚时间因素化框架(BTF),特别是时空数据。通过将低级别矩阵/加速因子化和矢量自动递增(VAR)进程纳入单一概率图形模型,这一框架可以描述大规模时间序列数据中全球和当地构成的特点。图形模型使我们能够在不估算这些缺失值的情况下,有效地进行概率性预测和提出不确定性估计。我们开发高效的布基抽样算法,用于实时预测的模型推断和模型更新,并测试关于若干实际世界间空数据渗透和多步骤滚动预测任务的拟议 BTF框架。数字实验展示了现有方法的优势性。