Given access to a machine learning model, can an adversary reconstruct the model's training data? This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regression), reconstruction attacks are simple and can be derived in closed-form. For more general models (e.g. neural networks), we propose an attack strategy based on training a reconstructor network that receives as input the weights of the model under attack and produces as output the target data point. We demonstrate the effectiveness of our attack on image classifiers trained on MNIST and CIFAR-10, and systematically investigate which factors of standard machine learning pipelines affect reconstruction success. Finally, we theoretically investigate what amount of differential privacy suffices to mitigate reconstruction attacks by informed adversaries. Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e.g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.
翻译:在获得机器学习模型时,对手能够重建模型的培训数据吗?本项工作从一个熟悉除一个外所有培训数据点的强大知情对手的角度研究这一问题。通过即时具体攻击,我们表明在这种严格的威胁模型中重建剩余数据点是可行的。对于 convex 模型(例如后勤倒退),重建攻击是简单的,可以从封闭式模型中得出。对于更一般的模型(例如神经网络),我们建议以培训一个重建者网络为基础的攻击战略,该重建者网络将获得攻击中模型的重量作为投入,并作为输出数据点产生。我们展示了我们对在MNIST和CIFAR-10培训的图像分类者的攻击的有效性,并系统地调查了标准机器学习管道的哪些因素影响重建成功。最后,我们从理论上调查了多少差异隐私权足以减轻知情对手的重建攻击。我们的工作提供了有效的重建攻击,模型开发者可以用来评估在以往工作中所考虑的一般环境中个别点的记忆的记忆化(例如基因化语言模型或获得培训梯度) ;我们展示了我们攻击程度高标准模型能够使数据能够成功地储存到最低程度的系统。