This paper studies the optimal rate of estimation in a finite Gaussian location mixture model in high dimensions without separation conditions. We assume that the number of components $k$ is bounded and that the centers lie in a ball of bounded radius, while allowing the dimension $d$ to be as large as the sample size $n$. Extending the one-dimensional result of Heinrich and Kahn \cite{HK2015}, we show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $\Theta((d/n)^{1/4} + n^{-1/(4k-2)})$, achieved by an estimator computable in time $O(nd^2+n^{5/4})$. Furthermore, we show that the mixture density can be estimated at the optimal parametric rate $\Theta(\sqrt{d/n})$ in Hellinger distance and provide a computationally efficient algorithm to achieve this rate in the special case of $k=2$. Both the theoretical and methodological development rely on a careful application of the method of moments. Central to our results is the observation that the information geometry of finite Gaussian mixtures is characterized by the moment tensors of the mixing distribution, whose low-rank structure can be exploited to obtain a sharp local entropy bound.
翻译:本文在高斯登定点地点混合物模型的高维度中研究最佳估计率。 我们假设部件的数量是捆绑的,中心位于一个捆绑的半径球中,同时允许其尺寸与样本大小一样大。 扩大海因里希和卡恩的单维结果,我们显示,估计瓦西斯坦距离混合分布的最小最大比率是$(d/n)1/4}+ n ⁇ -1/(4k-2)$) + n ⁇ -1/(4k-2) 。 理论和方法发展都依赖于认真应用时间(nd2+n ⁇ 5/4}美元)的估测器计算。 此外,我们表明,混合物密度可以按海林格距离的美元和Kahn\cite{Hkite{HK2015}的最佳准值估计,我们显示,在美元=2美元的特殊情况下,估算混合分布速度的最小速率是$(k=2$)+ n ⁇ -1/(4k-2)$。 理论和方法发展都依赖于认真应用精确的地测测测度结构, 其核心结果可以确定如何进行高压式的精确的测量。