重新思考针对深度神经网络的模型反演攻击 (Re-thinking Model Inversion Attacks Against Deep Neural Networks)

Model inversion (MI) attacks aim to infer and reconstruct private training data by abusing access to a model. MI attacks have raised concerns about the leaking of sensitive information (e.g. private face images used in training a face recognition system). Recently, several algorithms for MI have been proposed to improve the attack performance. In this work, we revisit MI, study two fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms, and propose solutions to these issues which lead to a significant boost in attack performance for all SOTA MI. In particular, our contributions are two-fold: 1) We analyze the optimization objective of SOTA MI algorithms, argue that the objective is sub-optimal for achieving MI, and propose an improved optimization objective that boosts attack performance significantly. 2) We analyze "MI overfitting", show that it would prevent reconstructed images from learning semantics of training data, and propose a novel "model augmentation" idea to overcome this issue. Our proposed solutions are simple and improve all SOTA MI attack accuracy significantly. E.g., in the standard CelebA benchmark, our solutions improve accuracy by 11.8% and achieve for the first time over 90% attack accuracy. Our findings demonstrate that there is a clear risk of leaking sensitive information from deep learning models. We urge serious consideration to be given to the privacy implications. Our code, demo, and models are available at https://ngoc-nguyen-0.github.io/re-thinking_model_inversion_attacks/

翻译：模型反演攻击旨在通过滥用对模型的访问来推断和重建私有训练数据。模型反演攻击引发了有关泄露敏感信息的担忧（例如，用于训练人脸识别系统的私有人脸图像）。最近，已经提出了几种算法来改进反演攻击的性能。在这项工作中，我们重新审视了模型反演攻击，并研究了两个涉及所有最先进（SOTA）模型反演攻击算法的根本问题，并提出了解决这些问题的解决方案，这些解决方案显着提高了所有SOTA模型反演攻击的性能。特别是，我们的贡献有两个方面：1）我们分析了SOTA模型反演攻击算法的优化目标，认为该目标对于实现模型反演是次优的，并提出了一种改进的优化目标，显着提高了攻击性能。2）我们分析了“模型反演过度拟合”现象，表明它会防止重构图像学习训练数据的语义，并提出了一种新的“模型增强”思路来克服这个问题。我们提出的解决方案简单有效，显着提高了所有SOTA模型反演攻击的准确性。例如，在标准CelebA基准测试中，我们的解决方案将准确性提高了11.8%，首次实现了攻击精度超过90%。我们的研究结果表明，从深度学习模型中泄露敏感信息的风险是存在的。我们敦促认真考虑隐私问题。我们的代码、演示和模型可在https://ngoc-nguyen-0.github.io/re-thinking_model_inversion_attacks/ 上获得。