We study the following fundamental hypothesis testing problem, which we term Gaussian mean testing. Given i.i.d. samples from a distribution $p$ on $\mathbb{R}^d$, the task is to distinguish, with high probability, between the following cases: (i) $p$ is the standard Gaussian distribution, $\mathcal{N}(0,I_d)$, and (ii) $p$ is a Gaussian $\mathcal{N}(\mu,\Sigma)$ for some unknown covariance $\Sigma$ and mean $\mu \in \mathbb{R}^d$ satisfying $\|\mu\|_2 \geq \epsilon$. Recent work gave an algorithm for this testing problem with the optimal sample complexity of $\Theta(\sqrt{d}/\epsilon^2)$. Both the previous algorithm and its analysis are quite complicated. Here we give an extremely simple algorithm for Gaussian mean testing with a one-page analysis. Our algorithm is sample optimal and runs in sample linear time.
翻译:我们研究的是以下基本假设测试问题, 我们称之为 Gausbb{ R ⁇ d$ 。 根据以$\ mathb{ R ⁇ d$为单位的分布式美元样本的i.d.d. d, 任务在于以高概率在以下两种情况下区分:(一) 美元是标准高山分布法, $\ mathcal{N}( 0, I_d) 美元, 和 (二) 美元是高山 $\ mathcal{N}( mu,\ sigma) 美元, 对于某些未知的共差值$\ sigma$ 和 $\ mu\ in\ mathb{ R ⁇ d$\ d$\\\\\\\ mu\\\ eq\ epsilon$。 最近的工作为这个测试问题提供了一种算法, 以$\ Theta(\ qrt{ d} /\\ epsilon2) 美元的最佳样本复杂性。 。 之前的算法及其分析都非常复杂。 这里我们用非常简单的算法来用一页分析高斯平均测试。 我们的算法是最优化的试样, 并运行线性。