Privacy regulation laws, such as GDPR, impose transparency and security as design pillars for data processing algorithms. In this context, federated learning is one of the most influential frameworks for privacy-preserving distributed machine learning, achieving astounding results in many natural language processing and computer vision tasks. Several federated learning frameworks employ differential privacy to prevent private data leakage to unauthorized parties and malicious attackers. Many studies, however, highlight the vulnerabilities of standard federated learning to poisoning and inference, thus raising concerns about potential risks for sensitive data. To address this issue, we present SGDE, a generative data exchange protocol that improves user security and machine learning performance in a cross-silo federation. The core of SGDE is to share data generators with strong differential privacy guarantees trained on private data instead of communicating explicit gradient information. These generators synthesize an arbitrarily large amount of data that retain the distinctive features of private samples but differ substantially. In this work, SGDE is tested in a cross-silo federated network on images and tabular datasets, exploiting beta-variational autoencoders as data generators. From the results, the inclusion of SGDE turns out to improve task accuracy and fairness, as well as resilience to the most influential attacks on federated learning.
翻译:隐私监管法,如GDPR,将透明度和安全作为数据处理算法的设计支柱。在这方面,联邦学习是保护隐私分布式机器学习的最有影响力的框架之一,在许多自然语言处理和计算机视觉任务中取得了惊人的成果。几个联邦学习框架采用不同的隐私,以防止私人数据泄漏给未经授权的当事方和恶意攻击者。但是,许多研究强调标准联邦学习容易中毒和推断,从而引起对敏感数据潜在风险的关切。为解决这一问题,我们介绍了SGDE,一个基因化数据交换协议,在跨锡洛联合会中提高用户安全和机器学习绩效。SGDE的核心是分享数据生成者,在私人数据方面有很强的差别隐私保障,而不是传播明确的梯度信息。这些生成者将任意大量的数据合成在一起,这些数据保留了私人样本的特征,但差异很大。在这项工作中,SSGDE在图像和表格数据集的跨银化网络中进行了测试,利用贝色变自变自译数据生成器作为数据生成器。从最有影响力的准确性、最有影响力的学习任务到最有影响力的GSG。