(Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems. A major challenge facing this popular technique is how to effectively preserve the privacy of sensitive data. Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM. However, unlike in the non-private case, existing techniques are not yet able to provide finite sample statistical guarantees. To address this issue, we propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees. Moreover, we apply our general framework to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically, for GMM in the DP model, our estimation error is near optimal in some cases. For the other two models, we provide the first finite sample statistical guarantees. Our theory is supported by thorough numerical experiments.
翻译:期望最大化(EM)是一种广泛使用的算法,用于估计混合模型的最大可能性或不完整的数据问题。这一流行技术所面临的一项主要挑战是如何有效保护敏感数据的隐私。以前对这一问题的研究已经导致发现了某些(显著)EM的差别私人算法。然而,与非私人案例不同,现有技术尚不能提供有限的抽样统计保证。为解决这一问题,我们在本文件中提议了第一个带有统计保障的(显著)EM算法的DP版本。此外,我们把我们的一般框架应用到三个卡通模型:高斯混合混合模型(GMM)、倒退模型(MRM)和与失踪共变体(RMC)的线性回归模型(RMC)。具体地说,对于DP模型中的GMM,我们的估计错误在某些案例中几乎是最佳的。对于其他两种模型,我们提供了第一个有统计保障的(显著)EM的样本。我们的理论得到了彻底的数字实验的支持。