As a long-term threat to the privacy of training data, membership inference attacks (MIAs) emerge ubiquitously in machine learning models. Existing works evidence strong connection between the distinguishability of the training and testing loss distributions and the model's vulnerability to MIAs. Motivated by existing results, we propose a novel training framework based on a relaxed loss with a more achievable learning target, which leads to narrowed generalization gap and reduced privacy leakage. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Through extensive evaluations on five datasets with diverse modalities (images, medical data, transaction records), our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs as well as model utility. Our defense is the first that can withstand a wide range of attacks while preserving (or even improving) the target model's utility. Source code is available at https://github.com/DingfanChen/RelaxLoss
翻译:由于对培训数据隐私的长期威胁,在机器学习模式中,成员推论攻击(MIAs)无处不在。现有工作证明,在培训和测试损失分布的区别性与模型对MIAs的脆弱性之间有着强有力的联系。受现有结果的驱使,我们提议基于宽松损失的新培训框架,以更加可实现的学习目标为基础,从而缩小普遍化差距和减少隐私泄漏。放松限制适用于任何分类模式,其附加的好处是易于执行和微不足道的管理费。通过对五套具有不同模式的数据集(图像、医疗数据、交易记录)进行广泛评价,我们的方法在抵御MIAs以及模型效用方面始终超越了最先进的防御机制。我们的辩护是第一个能够承受广泛攻击,同时保存(甚至改进)目标模型的效用的辩护。资料来源代码可在https://github.com/DingfanChen/Relax Los查阅。