High-dimensional time series datasets are becoming increasingly common in many areas of biological and social sciences. Some important applications include gene regulatory network reconstruction using time course gene expression data, brain connectivity analysis from neuroimaging data, structural analysis of a large panel of macroeconomic indicators, and studying linkages among financial firms for more robust financial regulation. These applications have led to renewed interest in developing principled statistical methods and theory for estimating large time series models given only a relatively small number of temporally dependent samples. Sparse modeling approaches have gained popularity over the last two decades in statistics and machine learning for their interpretability and predictive accuracy. Although there is a rich literature on several sparsity inducing methods when samples are independent, research on the statistical properties of these methods for estimating time series models is still in progress. We survey some recent advances in this area, focusing on empirically successful lasso based estimation methods for two canonical multivariate time series models - stochastic regression and vector autoregression. We discuss key technical challenges arising in high-dimensional time series analysis and outline several interesting research directions.
翻译:高维时间序列数据集在生物和社会科学的许多领域越来越常见。一些重要应用包括利用时间课程基因表达数据进行基因监管网络重建、神经成像数据对大脑连接进行分析、对大量宏观经济指标进行结构分析、研究金融公司之间的联系以建立更健全的金融监管。这些应用使人们对制定有原则的统计方法和大时间序列模型估算理论重新产生兴趣,仅以相对较少的时间依赖样本为参照点。粗糙的模型方法在过去20年中在统计和机器学习中因其可解释性和预测性准确性而越来越受欢迎。尽管在样品独立时有关于多种孔径诱发方法的丰富文献,但关于这些方法在估计时间序列模型方面的统计特性的研究仍在进行中。我们调查了这一领域最近的一些进展,侧重于两个卡星多变多时序列模型的实验性成功拉索估算方法 -- -- 随机回归和矢量自动回归。我们讨论了高维时序列分析中出现的关键技术挑战,并概述了几个有趣的研究方向。