Estimating a covariance matrix is central to high-dimensional data analysis. The proposed method is motivated by the dependence pattern analyses of multiple types of high-dimensional biomedical data including but not limited to genomics, proteomics, microbiome, and neuroimaging data. The correlation matrices of these biomedical data all demonstrate a well-organized block pattern. In this pattern, the positive and negative pair-wise correlations with large absolute values, are mainly concentrated within diagonal and off-diagonal blocks. We develop a covariance- and precision-matrix estimation framework to fully leverage the organized block pattern. We propose new best unbiased covariance- and precision-matrix estimators in closed forms, and develop theories for the asymptotic proprieties of estimators in both scenarios where the number of blocks is less or greater than the sample size. The simulation and data example analyses show that our method is robust and improves the accuracy of covariance- and precision-matrix estimation.
翻译:估计共变矩阵是高维数据分析的核心。拟议方法的动因是对多种高维生物医学数据(包括但不限于基因组学、蛋白质组学、微生物学和神经成形数据)的依附模式分析。这些生物医学数据的相关矩阵都显示出一个组织完善的区块模式。在这一模式下,与大绝对值的正对对对对对对对应关系主要集中在对面和对面外区块内。我们开发了一个共变和精确矩阵估计框架,以充分利用有组织的区块模式。我们以封闭形式提出了新的最佳不偏向共变和精确矩阵估测器,并在两个区块数量小于或大于抽样大小的情景下,为估算器的无损专有性制定了理论。模拟和数据实例分析表明,我们的方法是稳健的,提高了常变和精确矩阵估计的准确性。