Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
翻译:深心神经网络具有很强的记忆能力,对基本培训数据进行记忆化,这可能是一个严重的隐私问题。这个问题的有效解决办法是培训具有不同隐私的模型,通过向梯度输入随机噪音,提供严格的隐私保障。本文侧重于在多个参与者中分配敏感数据的情景,这些参与者通过联合学习(FL),共同培训一个模型,同时使用安全的多功能计算(MPC),确保每个梯度更新的保密性,并使用不同的隐私,以避免由此形成的模型中的数据泄漏。在这一背景下,一个重大挑战是,在深层学习中执行DP的复杂机制,这种机制注入了真正有价值的噪音,从根本上与MPC不相容,而后者在参与者中交流有限的场面整数。因此,大多数现有的DP机制需要相当高的噪音水平,导致模型效用很差。我们为此提出Scellem混合机制(SMMM),一种在FL所建模型上执行DP的方法(与现行方法相比,SMMM消除了一个假设,即输入的梯度必须持续进行整价估价,从而降低了用于保存DP的深度数据的噪音数量。此外(SMMM的Smm),因此,Smmal的数学结构结构的精确的模型的模型的模型的模型的分布是用来进行严格的分析。