Federated learning (FL) is getting increased attention for processing sensitive, distributed datasets common to domains such as healthcare. Instead of directly training classification models on these datasets, recent works have considered training data generators capable of synthesising a new dataset which is not protected by any privacy restrictions. Thus, the synthetic data can be made available to anyone, which enables further evaluation of machine learning architectures and research questions off-site. As an additional layer of privacy-preservation, differential privacy can be introduced into the training process. We propose DPD-fVAE, a federated Variational Autoencoder with Differentially-Private Decoder, to synthesise a new, labelled dataset for subsequent machine learning tasks. By synchronising only the decoder component with FL, we can reduce the privacy cost per epoch and thus enable better data generators. In our evaluation on MNIST, Fashion-MNIST and CelebA, we show the benefits of DPD-fVAE and report competitive performance to related work in terms of Fr\'echet Inception Distance and accuracy of classifiers trained on the synthesised dataset.
翻译:联邦学习组织(FL)日益重视处理保健等领域常见的敏感、分布式数据集。最近的工作不是直接培训关于这些数据集的分类模型,而是考虑培训能够合成没有隐私限制保护的新数据集的数据生成器。因此,可以向任何人提供合成数据,从而能够进一步评估机器学习架构和场外研究问题。作为保护隐私的另外一层,可以在培训过程中引入不同的隐私。我们提议DPD-fVAE,即一个与差异-私人解密公司联合的动态自动计算机,为随后的机器学习任务合成一个新的、贴标签的数据集。通过只与FL同步解密组件,我们可以降低每粒子的隐私成本,从而使得更好的数据生成器得以使用。在对MNIST、Fashaon-MNIST和CelebA的评估中,我们展示DD-fVAE的好处,并报告在Fr\hechet Incepion远程和在合成数据集培训的分类员的准确性工作方面的竞争性表现。