In this paper, we study the problem of estimating latent variable models with arbitrarily corrupted samples in high dimensional space ({\em i.e.,} $d\gg n$) where the underlying parameter is assumed to be sparse. Specifically, we propose a method called Trimmed (Gradient) Expectation Maximization which adds a trimming gradients step and a hard thresholding step to the Expectation step (E-step) and the Maximization step (M-step), respectively. We show that under some mild assumptions and with an appropriate initialization, the algorithm is corruption-proofing and converges to the (near) optimal statistical rate geometrically when the fraction of the corrupted samples $\epsilon$ is bounded by $ \tilde{O}(\frac{1}{\sqrt{n}})$. Moreover, we apply our general framework to three canonical models: mixture of Gaussians, mixture of regressions and linear regression with missing covariates. Our theory is supported by thorough numerical results.
翻译:在本文中,我们研究了在高维空间( 也就是说, $d\gg n$) 任意腐蚀的样本中估计潜在变量模型的问题。 具体地说, 我们提出了一种叫Trimmed( 大) 期望最大化的方法, 这种方法分别对期望步骤( E- 步) 和最大化步骤( M- 步) 添加了三重梯度梯度步骤和硬阈值步骤。 我们表明, 在一些温和的假设和适当的初始化下, 算法是防止腐败的, 并在( 近于) 最佳统计率的几何上与( ) 最佳统计率趋同。 当腐蚀样品中, $\ epsilon 部分被 $\ tilde{ O} (\ frac{ 1unsqrt{ n_ } (\\\\\\\\\ \ \ ) 。 此外, 我们把我们的总框架应用于三种卡通模型: 高斯人混合物、 回归和线性回归与缺失的计算结果的支持。 我们的理论得到彻底的数字支持。