We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identifiable Gaussian mixture in a univariate setting where we prove that the EM algorithm converges in order $n^{\frac{3}{4}}$ steps and returns estimates that are at a Euclidean distance of order ${ n^{- \frac{1}{8}}}$ and ${ n^{-\frac{1} {4}}}$ from the true location and scale parameter respectively. Establishing the slow rates in the univariate setting requires a novel localization argument with two stages, with each stage involving an epoch-based argument applied to a different surrogate EM operator at the population level. We demonstrate several multivariate ($d \geq 2$) examples that exhibit the same slow rates as the univariate case. We also prove slow statistical rates in higher dimensions in a special case, when the fitted covariance is constrained to be a multiple of the identity.
翻译:我们研究的是一类可识别位置比例差的混合物模型,据了解,根据美元(i.d.)的样本,其最大可能性估计值低于典型的 $@-\\frac{1 ⁇ 2 ⁇ 2 ⁇ 1 ⁇ 2 ⁇ 1美元差错。我们调查期望-最大化算法是否也对这些模型分别缓慢地趋同。我们对EM进行严格的定性,以在一个单向环境中安装可识别度差的高萨混合物,在单向环境中,我们证明EM算法按照美元(n ⁇ frac{3 ⁇ 4 ⁇ 4 ⁇ 4 ⁇ 4美元)的步伐和返回估计值相交汇,在Euclidean 距离为 ${n ⁇ -\\\\frac{1 ⁇ 8 ⁇ 8$和${n ⁇ \\\\\\\frac{1}1 ⁇ 4 ⁇ 1美元之间,这些模型是否分别与真实的位置和尺度参数相交汇。在单向低位设置的慢速率需要一个新的本地化论证,每个阶段都有一个基于近地参数的参数适用于人口层面的EM操作操作。我们展示了多个多变位数率,在特殊身份上的一个案例是慢度。