高频数据中高维噪音的最佳共变矩阵估计 (Optimal covariance matrix estimation for high-dimensional noise in high-frequency data)

We consider high-dimensional measurement errors with high-frequency data. Our focus is on recovering the covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding. By developing a new technical device integrating the high-frequency data feature with the conventional notion of $\alpha$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions. We then establish cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis.

翻译：我们用高频数据来考虑高频测量错误。我们的重点是以优化的方式恢复随机差错的共变矩阵。在这个问题中, 并不是随机矢量的所有组成部分都是同时观测的, 而测量错误是潜在的变量, 导致高数据维度之外的主要挑战。我们在此背景下提出一个新的共变矩阵估计器, 并有适当的本地化和阈值。通过开发一个新的技术设备, 将高频数据特征与常规概念 $\ alpha$- mixing 结合起来, 我们的分析成功地适应了测量错误中具有挑战性的序列依赖性。我们的理论分析建立了与两个常用损失函数相关的最小最大最佳趋同率。我们随后会建立一些案例, 当拟议的设定阈值的局部估计符实现了最小化最佳趋同率时。考虑到差异和共变异性在现实中可能很小, 我们进行二级的理论分析, 进一步解开估算器中标定偏差的偏差。然后提出一个有偏差的估测算器, 以确保其实际的选样性样性能。我们用真实的模拟了拟议的数据分析。