估计在一般失踪依赖之下高维乘数和精确计数 (Estimating High-dimensional Covariance and Precision Matrices under General Missing Dependence)

A sample covariance matrix $\boldsymbol{S}$ of completely observed data is the key statistic in a large variety of multivariate statistical procedures, such as structured covariance/precision matrix estimation, principal component analysis, and testing of equality of mean vectors. However, when the data are partially observed, the sample covariance matrix from the available data is biased and does not provide valid multivariate procedures. To correct the bias, a simple adjustment method called inverse probability weighting (IPW) has been used in previous research, yielding the IPW estimator. The estimator plays the role of $\boldsymbol{S}$ in the missing data context so that it can be plugged into off-the-shelf multivariate procedures. However, theoretical properties (e.g. concentration) of the IPW estimator have been only established under very simple missing structures; every variable of each sample is independently subject to missing with equal probability. We investigate the deviation of the IPW estimator when observations are partially observed under general missing dependency. We prove the optimal convergence rate $O_p(\sqrt{\log p / n})$ of the IPW estimator based on the element-wise maximum norm. We also derive similar deviation results even when implicit assumptions (known mean and/or missing probability) are relaxed. The optimal rate is especially crucial in estimating a precision matrix, because of the "meta-theorem" that claims the rate of the IPW estimator governs that of the resulting precision matrix estimator. In the simulation study, we discuss non-positive semi-definiteness of the IPW estimator and compare the estimator with imputation methods, which are practically important.

翻译：完全观察数据的样本共变矩阵 $\ boldsymbol{S} 美元是大量多变量统计程序的关键统计, 如结构化的共变/精度矩阵估算、主元组件分析和平均矢量平等测试。但是, 当部分观测数据时, 可用数据中的样本共变矩阵存在偏差, 没有提供有效的多变程序。为了纠正偏差, 在先前的研究中使用了一个称为反概率加权( IPW) 的简单调整方法, 产生 IPW 估测器。在缺少的数据环境里, 估测器的作用是 $\ boldsymall{S}, 这样它可以被插入到离流的多变程序中。然而, IPW 估测器的理论属性( 如集中度) 只在非常简单的缺失结构下建立; 每个样本的变量都独立存在相同的概率。当观察部分观测到时, IPW IPW 估测算器的偏差( IPW 估测器) 。在缺少一般依赖下观测时, IPW 的精确度的精确度值值值值 {Slationrdeal relational relational 中, 我们证明最接近的精确的精确度是最精确的。