Privacy regulation laws, such as GDPR, impose transparency and security as design pillars for data processing algorithms. In this context, federated learning is one of the most influential frameworks for privacy-preserving distributed machine learning, achieving astounding results in many natural language processing and computer vision tasks. Several federated learning frameworks employ differential privacy to prevent private data leakage to unauthorized parties and malicious attackers. Many studies, however, highlight the vulnerabilities of standard federated learning to poisoning and inference, thus, raising concerns about potential risks for sensitive data. To address this issue, we present SGDE, a generative data exchange protocol that improves user security and machine learning performance in a cross-silo federation. The core of SGDE is to share data generators with strong differential privacy guarantees trained on private data instead of communicating explicit gradient information. These generators synthesize an arbitrarily large amount of data that retain the distinctive features of private samples but differ substantially. We show how the inclusion of SGDE into a cross-silo federated network improves resilience to the most influential attacks to federated learning. We test our approach on images and tabular datasets, exploiting beta-variational autoencoders as data generators and highlighting fairness and performance improvements over local and federated learning on non-generated data.
翻译:私隐监管法,如GDPR,将透明度和安全作为数据处理算法的设计支柱。在这方面,联邦学习是保存隐私分布式机器学习的最有影响力的框架之一,在许多自然语言处理和计算机视觉任务中取得了惊人的成果。几个联邦学习框架采用不同的私隐,以防止私人数据泄漏给未经授权的当事方和恶意攻击者。但是,许多研究强调标准联邦学习容易中毒和推断,从而引起对敏感数据潜在风险的关切。为解决这一问题,我们提出SGDE,一个基因化数据交换协议,在跨锡洛联合会中提高用户安全和机器学习绩效。SGDE的核心是共享数据生成器,在私人数据方面有很强的差别隐私保障,而不是传递明确的梯度信息。这些生成器将大量专有保留私人抽样特征但差异很大的数据合成在一起。我们展示了将SGDE纳入跨筒化联结网络如何提高对最有影响力的联结式学习的复原力。我们测试了我们关于图像和表格数据配置方法,并测试了本地数据配置的公平性,同时利用了自动数据转换工具,将数据转换为非自动更新的数据。