Missing value problem in spatiotemporal traffic data has long been a challenging topic, in particular for large-scale and high-dimensional data with complex missing mechanisms and diverse degrees of missingness. Recent studies based on tensor nuclear norm have demonstrated the superiority of tensor learning in imputation tasks by effectively characterizing the complex correlations/dependencies in spatiotemporal data. However, despite the promising results, these approaches do not scale well to large tensors. In this paper, we focus on addressing the missing data imputation problem for large-scale spatiotemporal traffic data. To achieve both high accuracy and efficiency, we develop a scalable autoregressive tensor learning model -- Low-Tubal-Rank Autoregressive Tensor Completion (LATC-Tubal) -- based on the existing framework of Low-Rank Autoregressive Tensor Completion (LATC), which is well-suited for spatiotemporal traffic data that characterized by multidimensional structure of location$\times$ time of day $\times$ day. In particular, the proposed LATC-Tubal model involves a scalable tensor nuclear norm minimization scheme by integrating linear unitary transformation. Therefore, the tensor nuclear norm minimization can be solved by singular value thresholding on the transformed matrix of each day while the day-to-day correlation can be effectively preserved by the unitary transform matrix. Before setting up the experiment, we consider two large-scale 5-minute traffic speed data sets collected by the California PeMS system with 11160 sensors. We compare LATC-Tubal with state-of-the-art baseline models, and find that LATC-Tubal can achieve competitively accuracy with a significantly lower computational cost. In addition, the LATC-Tubal will also benefit other tasks in modeling large-scale spatiotemporal traffic data, such as network-level traffic forecasting.
翻译:远洋交通数据中缺失的价值问题长期以来一直是一个具有挑战性的议题,特别是对于具有复杂缺失机制和不同程度缺失的大规模高维数据而言,这是一个具有挑战性的议题。最近基于高核规范的研究显示,根据低兰-兰-自动递增梯度完成(LATC-Tubal)的现有框架,在估算估算任务时,高压学习优于强度学习。然而,尽管取得了有希望的结果,但这些方法对于大型蒸汽来说并不十分合适。在本文中,我们侧重于解决大规模超广空交通数据中缺失的数据估算问题。为了实现高度准确性和效率,我们开发了可缩放的自动递增拉多尔学习模式 -- -- 低土巴尔-兰克-自动递增梯度完成(LATC-Tubal) -- -- 基于低兰-自动递增时空自动递增标度完成(LATC)的现有框架,这些方法对于以高频-时价结构为特征的多频计算数据计算问题。在日内,我们可调调基流中,高调的每日成本-更低的运流流流数据计算。特别是,拟议的LAT-直线-直线- 将每个标准的极地标准转换成,通过每天的基模式,通过高的极地平极规则化模型进行大幅的基流变化,可以实现。