DataLen: 通过梯度压缩和聚合进行可缩放隐私保护培训 (DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation)

Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DATALENS. Comparing with the standard PATE privacy-preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we take advantage of communication efficient learning and propose a novel noise compression and aggregation approach TOPAGG by combining top-k compression for dimension reduction with a corresponding noise injection mechanism. We theoretically prove that the DATALENS framework guarantees differential privacy for its generated data, and provide analysis on its convergence. To demonstrate the practical usage of DATALENS, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA, and we show that, DATALENS significantly outperforms other baseline DP generative models. In addition, we adapt the proposed TOPAGG approach, which is one of the key building blocks in DATALENS, to DP SGD training, and show that it is able to achieve higher utility than the state-of-the-art DP SGD approach in most cases.

翻译：最近深神经网络(DNNS)的成功取决于大规模数据集的可用性;然而,关于这类数据集的培训往往对敏感的培训信息构成隐私风险。在本文件中,我们的目标是探索基因模型和梯度宽度模型的力量,并提出可扩缩的隐私保存基因模型DATALENS。与标准PATE隐私保护框架相比,允许教师对一维预测进行投票,对高维梯度矢量的投票在隐私保护方法方面具有挑战性。由于需要降低尺寸技术,我们需要在(1)改进隐私保护以及(2) SGD趋同减慢之间,找到一个微妙的交换空间。为了解决这个问题,我们利用了高效的通信模型的力量,提出了新的噪音压缩和集成方法,将降低尺寸的顶级压缩与相应的噪音注入机制结合起来。我们理论上证明DATALENS框架保证其生成的数据有差异性隐私权,并提供关于其趋同性的分析。为了展示DATALENS的实际使用,我们需要在多种数据集中进行广泛的实验,包括MNIST、FAS-DGA、我们GAS-DADA的升级,这是我们GADADMADADADDDDDMDA,这是我们GADDDDMDMDMDDA中的一项重大的升级。