Differentially private data release receives rising attention in machine learning community. Recently, an algorithm called DPMix is proposed to release high-dimensional data after a random mixup of degree $m$ with differential privacy. However, limited theoretical justifications are given about the "sweet spot $m$" phenomenon, and directly applying DPMix to image data suffers from severe loss of utility. In this paper, we revisit random mixup with recent progress on differential privacy. In theory, equipped with Gaussian Differential Privacy with Poisson subsampling, a tight closed form analysis is presented that enables a quantitative characterization of optimal mixup $m^*$ based on linear regression models. In practice, mixup of features, extracted by handcraft or pre-trained neural networks such as self-supervised learning without labels, is adopted to significantly boost the performance with privacy protection. We name it as Differentially Private Feature Mixup (DPFMix). Experiments on MNIST, CIFAR10/100 are conducted to demonstrate its remarkable utility improvement and protection against attacks.
翻译:不同的私人数据发布在机器学习界受到越来越多的关注。 最近, 一种名为 DPMix 的算法提议在随机混杂度为百万美元后释放高维数据。 但是, 对“ sweet spot $m $m” 现象给出了有限的理论理由, 直接将 DPMix 应用于图像数据会严重丧失实用性。 在本文中, 我们重温随机混杂与不同隐私的最新进展。 从理论上讲, 配有高萨差异隐私和Poisson 子取样, 进行了严格的封闭式分析, 从而能够对基于线性回归模型的最佳混杂度进行定量定性。 在实践中, 由手工艺或事先训练过的神经网络( 如无标签的自我监督学习) 提取的功能混杂, 以大大提升隐私保护的性能。 我们将其命名为“ 差异性私人功能混合( DPFMinix) 。 对MNIST( CIFAR10100) 进行实验, 以展示其显著的实用性改进和保护性。