Inference attacks against Machine Learning (ML) models allow adversaries to learn sensitive information about training data, model parameters, etc. While researchers have studied, in depth, several kinds of attacks, they have done so in isolation. As a result, we lack a comprehensive picture of the risks caused by the attacks, e.g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks -- namely, membership inference, model inversion, attribute inference, and model stealing -- and establish a threat model taxonomy. Our extensive experimental evaluation, run on five model architectures and four image datasets, shows that the complexity of the training dataset plays an important role with respect to the attack's performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.
翻译:对机器学习(ML)模型的攻击使对手能够了解有关培训数据、模型参数等的敏感信息。研究人员深入研究了几种攻击,但孤立地研究了几种攻击。结果,我们缺乏对攻击所造成风险的全面了解,例如,他们可以应用的不同情景、影响其性能的共同因素、它们之间的关系或可能防御的有效性。在本文件中,我们通过对针对机器学习模型的不同推断攻击进行先行的全面风险评估来填补这一空白。我们集中研究四种攻击 -- -- 即会籍推断、模型反转、属性推断和模型盗窃 -- -- 并建立一个威胁模型分类学。我们的广泛实验性评价,根据五个模型和四个图像数据集进行,表明训练数据集的复杂性对攻击的性能起着重要作用,而模型盗窃和成员推断攻击的效果则具有负面关联性。我们还表明,像DP-SGD和知识蒸馏这样的防御只能减轻某些威胁,从而可以减少一些模型和模型化研究者对模型的分析,从而能够同样地评估其模型使用的风险。