Differential privacy (DP) is an essential technique for privacy-preserving. It was found that a large model trained for privacy preserving performs worse than a smaller model (e.g. ResNet50 performs worse than ResNet18). To better understand this phenomenon, we study high dimensional DP learning from the viewpoint of generalization. Theoretically, we show that for the simple Gaussian model with even small DP noise, if the dimension is large enough, then the classification error can be as bad as the random guessing. Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving. Experiments on real data support our theoretical results and demonstrate the advantage of the proposed method.
翻译:差异隐私(DP)是保护隐私的一项基本技术。 发现一个为隐私保护而培训的大型模型比一个较小的模型(例如ResNet50比ResNet18更差)表现得更差。 为了更好地了解这一现象,我们从一般化的角度研究高维DP学习。 从理论上讲,我们显示,对于一个即使带有小DP噪音的简单的高斯模型来说,如果尺寸足够大,那么分类错误可能和随机猜测一样糟糕。 然后,我们提出一个特征选择方法,以降低模型的大小,其依据是一种交换分类准确性和隐私保护的新标准。 对真实数据的实验支持我们的理论结果,并展示了拟议方法的优势。