半监督的预期最大化 (On the Semi-supervised Expectation Maximization)

The Expectation Maximization (EM) algorithm is widely used as an iterative modification to maximum likelihood estimation when the data is incomplete. We focus on a semi-supervised case to learn the model from labeled and unlabeled samples. Existing work in the semi-supervised case has focused mainly on performance rather than convergence guarantee, however we focus on the contribution of the labeled samples to the convergence rate. The analysis clearly demonstrates how the labeled samples improve the convergence rate for the exponential family mixture model. In this case, we assume that the population EM (EM with unlimited data) is initialized within the neighborhood of global convergence for the population EM that consists solely of samples that have not been labeled. The analysis for the labeled samples provides a comprehensive description of the convergence rate for the Gaussian mixture model. In addition, we extend the findings for labeled samples and offer an alternative proof for the population EM's convergence rate with unlabeled samples for the symmetric mixture of two Gaussians.

翻译：期望最大化算法被广泛用作数据不完整时最大可能性估计的迭代修改。我们侧重于一个半监督案例,从标签和无标签样本中学习模型。半监督案例的现有工作主要侧重于性能而不是趋同保证,但我们侧重于标签样本对趋同率的贡献。分析清楚地表明了标签样本如何提高指数家庭混合物模型的趋同率。在这种情况下,我们假设人口EM(具有无限数据的EM)是在人口EM全球趋同区范围内初始化的,该区仅包括未贴标签的样本。对标签样本的分析全面描述了高斯混合物模型的趋同率。此外,我们扩展了标签样本的研究结果,并为人口EM与两个高斯人对称混合物的无标签样本的趋同率提供了替代证据。