We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time series in the same cluster, accounting for intra-cluster similarity while minimizing potential noise in predictions by ignoring inter-cluster effects. Our framework -- which we refer to as "cluster-and-conquer" -- is highly general, allowing for any time-series forecasting and clustering method to be used in each step. It is computationally efficient and embarrassingly parallel. We motivate our framework with a theoretical analysis in an idealized mixed linear regression setting, where we provide guarantees on the quality of the estimates. We accompany these guarantees with experimental results that demonstrate the advantages of our framework: when instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets, sometimes outperforming deep-learning-based approaches.
翻译:我们建议了一个用于预测高维时间序列数据的三阶段框架。 我们的方法首先估计每个单流时间序列的参数。 接下来, 我们使用这些参数来分组时间序列。 这些组群可以被视为多变时间序列, 然后我们计算参数。 一个单一时间序列的预测值可以取决于同一组群中其他时间序列的历史, 计算集群内相似性, 同时通过忽略集群间效应来最大限度地减少预测中的潜在噪音。 我们的框架 -- -- 我们称之为“ 集群和组合” -- -- 是高度通用的, 允许在每一个步骤中使用任何时间序列的预测和分组方法。 它具有计算效率和令人尴尬的平行性。 我们用一个理想化的混合线性回归环境来激励我们的框架进行理论分析, 以此保证估算的质量。 我们伴随这些保证,实验结果显示了我们框架的优势: 当我们用简单的线性自动递增模型即实现一些基准数据集的状态结果, 有时是落后于深层学习方法。