We study the problem of testing the covariance matrix of a high-dimensional Gaussian in a robust setting, where the input distribution has been corrupted in Huber's contamination model. Specifically, we are given i.i.d. samples from a distribution of the form $Z = (1-\epsilon) X + \epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $\mathcal{N}(0, \Sigma)$, $B$ is a fixed but unknown noise distribution, and $\epsilon>0$ is an arbitrarily small constant representing the proportion of contamination. We want to distinguish between the cases that $\Sigma$ is the identity matrix versus $\gamma$-far from the identity in Frobenius norm. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. Moreover, this sample upper bound was shown to be best possible, within constant factors. Our main result is that the sample complexity of covariance testing dramatically increases in the contaminated setting. In particular, we prove a sample complexity lower bound of $\Omega(d^2)$ for $\epsilon$ an arbitrarily small constant and $\gamma = 1/2$. This lower bound is best possible, as $O(d^2)$ samples suffice to even robustly {\em learn} the covariance. The conceptual implication of our result is that, for the natural setting we consider, robust hypothesis testing is at least as hard as robust estimation.
翻译:我们研究高维高斯的共变矩阵问题, 在这种环境下输入分布在Huber的污染模型中被腐蚀, 输入分布在Huber的污染模型中被腐蚀。 具体地说, 我们从以Z = (1-\ epsilon) X +\\ epsilon B$的分布中获得一. d. 美元是零位和未知的共变系数x, 美元是零位和未知的共变系数=N}( 0, \ sigma) 美元, 美元是固定但未知的噪声分布, 美元=0美元是一个任意的小概念常数, 代表着污染的比例。 我们想要区分以美元表示身份矩阵是美元, 而以美元表示的远方。 在没有污染的情况下, 之前的工作为这个假设测试任务提供了一个简单的测试器, 使用以美元( d) 最低的样本。 此外, 样本显示, 最有可能, 以美元表示最好的, 。 我们的主要结果是, 以美元 美元 美元 的易变精度 = 25 的精确度测试结果, 的 。