This work represents a natural coalescence of two important lines of work: learning mixtures of Gaussians and algorithmic robust statistics. In particular we give the first provably robust algorithm for learning mixtures of any constant number of Gaussians. We require only mild assumptions on the mixing weights (bounded fractionality) and that the total variation distance between components is bounded away from zero. At the heart of our algorithm is a new method for proving dimension-independent polynomial identifiability through applying a carefully chosen sequence of differential operations to certain generating functions that not only encode the parameters we would like to learn but also the system of polynomial equations we would like to solve. We show how the symbolic identities we derive can be directly used to analyze a natural sum-of-squares relaxation.
翻译:这项工作代表了两个重要工作线的自然结合: 学习高斯语和算法强力统计的混合体。 我们特别为学习任何常数高斯语的混合体提供了第一种可以想象的稳健算法。 我们只需要对混合权重( 受限制的分数) 进行温和的假设, 并且将各组成部分之间的总差异距离与零隔开。 在我们算法的核心, 我们的算法是一个新的方法, 通过对某些生成功能应用精心选择的差别操作序列来证明维度独立的多面识别性。 这些功能不仅将我们想要了解的参数编码起来,而且还将我们想要解答的多面方程式系统编码起来。 我们展示了我们所生成的符号性身份如何直接用于分析自然和方形的放松。