Storage-efficient privacy-guaranteed learning is crucial due to enormous amounts of sensitive user data required for increasingly many learning tasks. We propose a framework for reducing the storage cost while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender classification and find that our suggested pipeline delivers in practice on the promise of the theory: the individuals in the images are unrecognizable (or less recognizable, depending on the noise level), overall storage of the data is substantially reduced, with no essential loss of the classification accuracy. As an added bonus, our experiments suggest that our method yields a substantial boost to robustness in the face of adversarial test data.
翻译:由于越来越多的学习任务需要大量的敏感用户数据,因此,存储高效的隐私保障学习至关重要。我们提出了一个框架,以减少存储成本,同时提供隐私保障,同时不造成学习数据使用方面的重大损失。我们的方法包括注入噪音,随后进行压缩。我们表明,当将损失压缩与增加噪音的分布适当匹配时,压缩实例在分发过程中与无噪音培训数据相融合。从这个意义上讲,学习数据的效用基本上得以维持,同时以数量减少储存和隐私泄漏。我们介绍了用于性别分类的CelibA数据集的实验结果,发现我们建议的管道实际上提供了理论的许诺:图像中的个人是无法辨认的(或视噪音程度而不太容易识别),数据的总体储存大大减少,没有基本丧失分类的准确性。作为附加的红利,我们的实验表明,我们的方法在面对对抗性测试数据时,会大大增强稳健性。