This paper is devoted to the estimation of the minimal dimension P of the state-space realizations of a high-dimensional time series y, defined as a noisy version (the noise is white and Gaussian) of a useful signal with low rank rational spectral density, in the high-dimensional asymptotic regime where the number of available samples N and the dimension of the time series M converge towards infinity at the same rate. In the classical low-dimensional regime, P is estimated as the number of significant singular values of the empirical autocovariance matrix between the past and the future of y, or as the number of significant estimated canonical correlation coefficients between the past and the future of y. Generalizing large random matrix methods developed in the past to analyze classical spiked models, the behaviour of the above singular values and canonical correlation coefficients is studied in the high-dimensional regime. It is proved that they are smaller than certain thresholds depending on the statistics of the noise, except a finite number of outliers that are due to the useful signal. The number of singular values of the sample autocovariance matrix above the threshold is evaluated, is shown to be almost independent from P in general, and cannot therefore be used to estimate P accurately. In contrast, the number s of canonical correlation coefficients larger than the corresponding threshold is shown to be less than or equal to P, and explicit conditions under which it is equal to P are provided. Under the corresponding assumptions, s is thus a consistent estimate of P in the high-dimensional regime. The core of the paper is the development of the necessary large random matrix tools.
翻译:本文专门用于估算高维时间序列(y) 国家空间成就的最小维度P, 定义为:高维无光度制度下低等级理性光谱密度的有用信号的噪音版本(噪音为白色和高西文),即高维无光度制度下现有样本的数量和时间序列M的维度以同一速度走向无限化。在传统的低维制度中,P被估算为过去与未来之间经验性自动变化矩阵的重大单值数量,或由于过去与未来之间大量估计的卡通因相关系数的数量(噪音为白色和高西文)。过去为分析典型悬浮模型而开发的大型随机矩阵方法、以上单值和可控因相关系数的行为在高空制度下研究。在传统的低维制度下,P值与直径直线性假设的基数是独立的。因此,无法从普通可变异性矩阵到直径直的基数,因此,无法从P值的基数到直线性基数的基数是独立的。