During the training of machine learning models, they may store or "learn" more information about the training data than what is actually needed for the prediction or classification task. This is exploited by property inference attacks which aim at extracting statistical properties from the training data of a given model without having access to the training data itself. These properties may include the quality of pictures to identify the camera model, the age distribution to reveal the target audience of a product, or the included host types to refine a malware attack in computer networks. This attack is especially accurate when the attacker has access to all model parameters, i.e., in a white-box scenario. By defending against such attacks, model owners are able to ensure that their training data, associated properties, and thus their intellectual property stays private, even if they deliberately share their models, e.g., to train collaboratively, or if models are leaked. In this paper, we introduce property unlearning, an effective defense mechanism against white-box property inference attacks, independent of the training data type, model task, or number of properties. Property unlearning mitigates property inference attacks by systematically changing the trained weights and biases of a target model such that an adversary cannot extract chosen properties. We empirically evaluate property unlearning on three different data sets, including tabular and image data, and two types of artificial neural networks. Our results show that property unlearning is both efficient and reliable to protect machine learning models against property inference attacks, with a good privacy-utility trade-off. Furthermore, our approach indicates that this mechanism is also effective to unlearn multiple properties.
翻译:在机器学习模型的培训过程中,他们可能储存或“阅读”更多有关培训数据的信息,而不是预测或分类任务所需的实际数据。这被财产推断攻击所利用,这些攻击旨在从某一模型的培训数据中提取统计属性,而没有获得培训数据本身。这些属性可能包括照片质量,以识别相机模型,显示产品目标受众的年龄分布,或包括主机类型,以完善计算机网络中的恶意攻击。当攻击者能够获取所有模型参数,即白箱情景中的数据时,这种攻击尤其准确。通过对这种攻击进行辩护,模型所有人能够确保其培训数据、相关属性以及因此其知识产权保持私有性,即使他们有意分享其模型,例如合作培训,或者模型被泄露。在本文中,我们引入了财产不学习机制,一个防止白箱财产攻击的有效防御机制,独立于培训数据类型、模型任务或财产数量。通过系统修改其培训的重量和知识产权数据库,我们无法系统地评估其不可靠财产攻击,我们所选择的数据类型和图表类型不能显示一种不同的资产分析结果。