The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data that may not be sparse; however, existing results depend on the independent setting of samples. In this study, we analyze a linear regression model with dependent time series data under over-parameterization settings. We consider an estimator via interpolation and developed a theory for the excess risk of the estimator. Then, we derive bounds of risks by the estimator for the cases where the temporal correlation of each coordinate of dependent data is homogeneous and heterogeneous, respectively. The derived bounds reveal that a temporal covariance of the data plays a key role; its strength affects the bias of the risk, and its nondegeneracy affects the variance of the risk. Moreover, for the heterogeneous correlation case, we show that the convergence rate of risks with short-memory processes is identical to that of cases with independent data, and the risk can converge to zero even with long-memory processes. Our theory can be extended to infinite-dimensional data in a unified manner. We also present several examples of specific dependent processes that can be applied to our setting.
翻译:近年来,大型模型的成功增加了统计模型的重要性,并提供了众多参数。一些研究分析了多参数线性模型的高度参数性线性模型,而高维数据可能并不稀少;但是,现有结果取决于样本的独立设置。在本研究中,我们分析了一个线性回归模型,在超度参数设置下有依赖的时间序列数据。我们考虑通过内推推推推算得出一个线性回归模型,并开发了测量器超常风险的理论。然后,我们从测算器中得出风险的界限,确定每个依赖数据协调点的时间相关性是均匀的和不均匀的。衍生的界限显示,数据的时间变量起着关键作用;其强度影响风险的偏差,而其非降解性影响风险的差异。此外,对于复杂关联性案例,我们发现,与短期数据过程的风险的趋同率与独立数据案例的趋同,风险甚至可能与长期模拟过程相趋同为零。我们推算的理论可以扩展至无限度数据,我们也可以以统一的方式应用若干具体的例子。