FF-TGAN:综合表格数据联邦学习框架 (Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data)

Generative Adversarial Networks (GANs) are typically trained to synthesize data, from images and more recently tabular data, under the assumption of directly accessible training data. Recently, federated learning (FL) is an emerging paradigm that features decentralized learning on client's local data with a privacy-preserving capability. And, while learning GANs to synthesize images on FL systems has just been demonstrated, it is unknown if GANs for tabular data can be learned from decentralized data sources. Moreover, it remains unclear which distributed architecture suits them best. Different from image GANs, state-of-the-art tabular GANs require prior knowledge on the data distribution of each (discrete and continuous) column to agree on a common encoding -- risking privacy guarantees. In this paper, we propose Fed-TGAN, the first Federated learning framework for Tabular GANs. To effectively learn a complex tabular GAN on non-identical participants, Fed-TGAN designs two novel features: (i) a privacy-preserving multi-source feature encoding for model initialization; and (ii) table similarity aware weighting strategies to aggregate local models for countering data skew. We extensively evaluate the proposed Fed-TGAN against variants of decentralized learning architectures on four widely used datasets. Results show that Fed-TGAN accelerates training time per epoch up to 200% compared to the alternative architectures, for both IID and Non-IID data. Overall, Fed-TGAN not only stabilizes the training loss, but also achieves better similarity between generated and original data.

翻译：在直接获得的培训数据假设下,典型的生成式Adversarial网络(GANs)是用图像和最近的数据表格来综合数据的培训。最近,Federate 学习(FL)是一个新兴范例,其特点是以隐私保护能力对客户的本地数据进行分散化学习。虽然刚刚演示了在FL系统上合成图像的GANs,但是,如果能够从分散的数据源中学习用于列表数据的GANs(GANs),则其分布最合适。此外,仍然不清楚是什么结构。不同于图像GANs、最先进的列表式GANs,需要事先了解每个(分辨和连续)栏的数据分配情况,才能商定通用编码 -- -- 风险隐私保障。在本文中,我们建议FD-TGAN(Fed-TANs),这是第一个关于非同式参与者的复合学习框架。为了有效地学习一个复杂的表格GAN,Fed-TAN(Fed-TAN)设计了两个新的特征:(i)在模型初始化时保留多源特性的编码;(二),需要事先了解每个(D-ID-ID-ID-ID)各(分解和连续)列数据模型的模拟的模拟的模拟的模拟,对比对比对比相同的重重度战略,我们使用的FD-AN(S-AL-ANS),用来广泛学习使用的数据模型的模拟的模拟数据模型。