高频数据中高维噪音的最佳共变矩阵估计 (Optimal covariance matrix estimation for high-dimensional noise in high-frequency data)

We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding, and then conduct a series of comprehensive theoretical investigations of the proposed estimator. By developing a new technical device integrating the high-frequency data feature with the conventional notion of $\alpha$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions; and we demonstrate with concrete cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We also extensively analyze our estimator in the setting with jumps, and show that its performance is reasonably robust. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis.

翻译：我们用高频数据来考虑高频测量错误。我们的目标是以最佳的方式恢复随机差错的高维跨区际共变矩阵。在这个问题中, 我们的分析不能同时观察随机矢量的所有组成部分, 测量错误是潜在的变量, 导致数据高度性以外的重大挑战。我们在此背景下提出一个新的共变量矩阵估计器, 并适当定位和阈值, 然后对拟议的估测器进行一系列全面的理论调查。通过开发一个新的技术设备, 将高频数据特征与常规的美元/ alpha$混合概念相结合, 我们的分析成功地适应测量错误中具有挑战性的序列依赖性。我们的理论分析确定了与两个常用损失函数相关的最小最大最佳趋同率; 我们用具体的例子演示了拟议的具有临界值的局部估算器, 并实现了最小化, 考虑到在现实中差异和共变差可能很小, 我们进行了二级的理论分析, 进一步混淆了在模拟中具有挑战性的序列依赖性的序列。我们的预测性能分析也以高度分析方式展示了我们拟议的真实性能。