Federated Learning (FL) is a paradigm for large-scale distributed learning which faces two key challenges: (i) efficient training from highly heterogeneous user data, and (ii) protecting the privacy of participating users. In this work, we propose a novel FL approach (DP-SCAFFOLD) to tackle these two challenges together by incorporating Differential Privacy (DP) constraints into the popular SCAFFOLD algorithm. We focus on the challenging setting where users communicate with a "honest-but-curious" server without any trusted intermediary, which requires to ensure privacy not only towards a third-party with access to the final model but also towards the server who observes all user communications. Using advanced results from DP theory, we establish the convergence of our algorithm for convex and non-convex objectives. Our analysis clearly highlights the privacy-utility trade-off under data heterogeneity, and demonstrates the superiority of DP-SCAFFOLD over the state-of-the-art algorithm DP-FedAvg when the number of local updates and the level of heterogeneity grow. Our numerical results confirm our analysis and show that DP-SCAFFOLD provides significant gains in practice.
翻译:联邦学习(FL)是大规模分布式学习的范例,面临两大挑战:(一) 利用高度多样化的用户数据进行有效培训,以及(二) 保护参与用户的隐私。在这项工作中,我们提出一种新的FL方法(DP-SCAFFOLD),通过将差异隐私(DP)限制纳入广受欢迎的SCAFFFOLD算法,共同应对这两项挑战。我们侧重于一个具有挑战性的环境,即用户与“诚实但有价值”的服务器进行沟通,而没有任何信任的中间人,这就要求确保不仅对能够接触最终模型的第三方,而且对观察所有用户通信的服务器的隐私。我们利用DP理论的先进结果,建立了我们关于 convex和非civex目标的算法的趋同。我们的分析明确强调了数据异质下的隐私-效用交换,并表明了DP-SCAFFOLD在本地更新数量和异质程度增长时的优势。我们的数字结果证实了我们的分析,并表明DP-SCAFFDLD做法提供了重大成果。