Learning a privacy-preserving model from sensitive data which are distributed across multiple devices is an increasingly important problem. The problem is often formulated in the federated learning context, with the aim of learning a single global model while keeping the data distributed. Moreover, Bayesian learning is a popular approach for modelling, since it naturally supports reliable uncertainty estimates. However, Bayesian learning is generally intractable even with centralised non-private data and so approximation techniques such as variational inference are a necessity. Variational inference has recently been extended to the non-private federated learning setting via the partitioned variational inference algorithm. For privacy protection, the current gold standard is called differential privacy. Differential privacy guarantees privacy in a strong, mathematically clearly defined sense. In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation runs done by individual parties, and two based on perturbing updates to the global model (one using a version of federated averaging, the second one adding virtual parties to the protocol), and compare their properties both theoretically and empirically.
翻译:学习保护敏感数据的隐私模型,这些数据分布在多个设备上是一个越来越重要的问题。该问题通常在联邦学习环境下进行设置,旨在学习一个单一的全局模型,同时保持数据的分布。此外,贝叶斯学习是一种流行的建模方法,因为它自然支持可靠的不确定性估计。然而,贝叶斯学习通常在集中式非私有数据情况下也是不可计算的,因此需要一些近似技术,如变分推理。最近通过分区变分推理算法将变分推理扩展到非私有联邦学习环境。对于隐私保护,当前的黄金标准被称为差分隐私。差分隐私以强大的、数学上明确定义的方式保证隐私。在本文中,我们提出了差分隐私分区变分推理,这是一种通用的框架,用于在联邦学习环境中学习贝叶斯后验分布的变分近似,同时最小化通信轮数并为数据主体提供差分隐私保证。我们提出了三种替代实现方案,一种是基于个体参与者的局部优化运行的扰动,另外两种是基于对全局模型更新进行扰动(一种使用联邦平均的版本,第二种添加虚拟参与者到协议中),并在理论和实证方面比较它们的特性。