DataLen: 通过梯度压缩和聚合进行可缩放隐私保护培训 (DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation)

Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DATALENS. Comparing with the standard PATE privacy-preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we take advantage of communication efficient learning and propose a novel noise compression and aggregation approach TOPAGG by combining top-k compression for dimension reduction with a corresponding noise injection mechanism. We theoretically prove that the DATALENS framework guarantees differential privacy for its generated data, and provide analysis on its convergence. To demonstrate the practical usage of DATALENS, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA, and we show that, DATALENS significantly outperforms other baseline DP generative models. In addition, we adapt the proposed TOPAGG approach, which is one of the key building blocks in DATALENS, to DP SGD training, and show that it is able to achieve higher utility than the state-of-the-art DP SGD approach in most cases. Our code is publicly available at https://github.com/AI-secure/DataLens.

翻译：深心神经网络(DNNS)最近的成功取决于大规模数据集的可用性;然而,关于这类数据集的培训往往会给敏感培训信息带来隐私风险。在本文件中,我们的目标是探索基因模型和梯度宽度的增殖力,并提议一个可扩缩的隐私保存基因模型模型DATALENS。与标准PATE隐私保护框架相比,该框架允许教师对一维预测进行投票,对高维梯度矢量的投票在隐私保护方法方面具有挑战性。由于需要降低尺寸技术,我们需要在(1)改进隐私保护以及(2) SGD趋同速度放缓之间,在(1) 改进隐私定义保护以及(2) 降低 SGD 趋同速度之间,我们利用通信高效学习的优势,提出新的噪音压缩和集成方法,将降低尺寸的顶级压缩与相应的噪音注入机制结合起来。我们理论上证明DATALENS框架保证其生成数据的隐私差异性,并分析其趋同性。为了展示DATALENS的实用性使用,我们需要在多种数据结构中进行广泛的实验,包括MNIST、FADADADADMADADMDADADAGDA,这是我们GDADDDMDDMDMDMDADADDDDDDDMDDDDDDDDDA的大规模的升级。