表格联邦学习中的数据泄漏 (Data Leakage in Tabular Federated Learning)

While federated learning (FL) promises to preserve privacy in distributed training of deep learning models, recent work in the image and NLP domains showed that training updates leak private data of participating clients. At the same time, most high-stakes applications of FL (e.g., legal and financial) use tabular data. Compared to the NLP and image domains, reconstruction of tabular data poses several unique challenges: (i) categorical features introduce a significantly more difficult mixed discrete-continuous optimization problem, (ii) the mix of categorical and continuous features causes high variance in the final reconstructions, and (iii) structured data makes it difficult for the adversary to judge reconstruction quality. In this work, we tackle these challenges and propose the first comprehensive reconstruction attack on tabular data, called TabLeak. TabLeak is based on three key ingredients: (i) a softmax structural prior, implicitly converting the mixed discrete-continuous optimization problem into an easier fully continuous one, (ii) a way to reduce the variance of our reconstructions through a pooled ensembling scheme exploiting the structure of tabular data, and (iii) an entropy measure which can successfully assess reconstruction quality. Our experimental evaluation demonstrates the effectiveness of TabLeak, reaching a state-of-the-art on four popular tabular datasets. For instance, on the Adult dataset, we improve attack accuracy by 10% compared to the baseline on the practically relevant batch size of 32 and further obtain non-trivial reconstructions for batch sizes as large as 128. Our findings are important as they show that performing FL on tabular data, which often poses high privacy risks, is highly vulnerable.

翻译：虽然联谊学习(FL)承诺在分布式深层次学习模式培训中保护隐私,但最近在图像和NLP领域开展的工作表明,培训更新披露了参与客户的私人数据。与此同时,大多数高端应用FL(例如法律和财政)使用表格数据。与NLP和图像领域相比,表列数据的重建带来了一些独特的挑战:(一) 绝对特征带来了一个更为困难的、相互交错的、离散的连续优化问题;(二) 明确和连续的特征混合在一起,导致最终重建出现巨大差异;(三) 结构化数据使得对手难以判断重建质量。在这项工作中,我们应对这些挑战并提出对表格数据进行第一次全面重建攻击,称为TabLeak。Tableak基于三个关键要素:(一) 之前的软体格结构,隐含蓄性地将混合的离散性优化问题转化为更加容易完全持续的优化问题;(二) 明确和连续的优化导致我们重建的差别缩小,方法是通过一个组合组合计划,利用表列数据结构的不精确度来判断最终判断重建质量质量质量。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日