We provide an estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an $\eta$-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the $L_{p}$-marginal moment with some $p \ge 4$ is equivalent to the corresponding $L_2$-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. As a part of our analysis, we prove a dimension-free Bai-Yin type theorem in the regime $p > 4$.
翻译:我们用两个标准的数据污染概念来估算在操作者规范中达到最佳趋同率(直至不变因素)的共变矩阵:我们允许对手任意腐蚀样品的美元折射,而其余数据点的分布只能满足以下一点:美元差价加上4美元,相当于相应的2美元差价。尽管只需要几分钟的时间,但我们的估测者还是得出了与高山一样的尾数估计。作为我们分析的一部分,我们证明制度内没有维度的白英型标本(Bai-Yin)相当于4美元。