The Expectation Maximization (EM) algorithm is the default algorithm for inference in latent variable models. As in any other field of machine learning, applications of latent variable models to very large datasets make the use of advanced parallel and distributed architectures mandatory. This paper introduces FedEM, which is the first extension of the EM algorithm to the federated learning context. FedEM is a new communication efficient method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets. To alleviate the communication bottleneck, FedEM compresses appropriately defined complete data sufficient statistics. We also develop and analyze an extension of FedEM to further incorporate a variance reduction scheme. In all cases, we derive finite-time complexity bounds for smooth non-convex problems. Numerical results are presented to support our theoretical findings, as well as an application to federated missing values imputation for biodiversity monitoring.
翻译:期望最大化算法是潜在变量模型中推断的默认算法。 与机器学习的其他任何领域一样, 将潜伏变量模型应用到非常大的数据集中, 强制使用先进的平行和分布式结构。 本文介绍了FedEM, 这是EM算法的第一个延伸到联合学习环境。 FedEM是一种新的通信高效方法, 处理局部设备的部分参与, 并且对数据集的不同分布十分活跃。 为了减轻通信瓶颈, FedEM 压缩 适当定义了完整的数据充足统计数据。 我们还开发并分析FedEM 的扩展, 以进一步纳入差异减少计划。 在所有情况下, 我们为平滑的非convex问题绘制了时段复杂界限 。 数字结果用于支持我们的理论发现, 以及用于对生物多样性监测的粘合缺失值估算的应用 。