多概念概念对抗性攻击 (Multi-concept adversarial attacks)

As machine learning (ML) techniques are being increasingly used in many applications, their vulnerability to adversarial attacks becomes well-known. Test time attacks, usually launched by adding adversarial noise to test instances, have been shown effective against the deployed ML models. In practice, one test input may be leveraged by different ML models. Test time attacks targeting a single ML model often neglect their impact on other ML models. In this work, we empirically demonstrate that naively attacking the classifier learning one concept may negatively impact classifiers trained to learn other concepts. For example, for the online image classification scenario, when the Gender classifier is under attack, the (wearing) Glasses classifier is simultaneously attacked with the accuracy dropped from 98.69 to 88.42. This raises an interesting question: is it possible to attack one set of classifiers without impacting the other set that uses the same test instance? Answers to the above research question have interesting implications for protecting privacy against ML model misuse. Attacking ML models that pose unnecessary risks of privacy invasion can be an important tool for protecting individuals from harmful privacy exploitation. In this paper, we address the above research question by developing novel attack techniques that can simultaneously attack one set of ML models while preserving the accuracy of the other. In the case of linear classifiers, we provide a theoretical framework for finding an optimal solution to generate such adversarial examples. Using this theoretical framework, we develop a multi-concept attack strategy in the context of deep learning. Our results demonstrate that our techniques can successfully attack the target classes while protecting the protected classes in many different settings, which is not possible with the existing test-time attack-single strategies.

翻译：随着机器学习(ML)技术在许多应用中日益被使用,它们很容易受到对抗性攻击的伤害。测试时间攻击通常通过在测试中增加对抗性噪音而发起,对部署的ML模型显示对试验时间攻击是有效的。实际上,不同的ML模型可能利用一种试验输入。针对单一ML模型的试验时间攻击往往忽视其对其他ML模型的影响。在这项工作中,我们从经验上表明,天真地攻击分类者学习一个概念可能会对受过训练以学习其他概念的分类者产生不利影响。例如,对于在线图像分类设想,当性别分类者受到攻击时,(wearing)格拉斯分类器同时受到攻击,其精确度从98.69下降到88.42。这提出了一个有趣的问题:攻击一组分类器是否能够不受其他测试实例的影响?对上述研究问题的答案对保护隐私不受ML模型滥用具有令人感兴趣的影响。攻击ML模型造成不必要的深度攻击,攻击ML模型可以成为保护个人不受有害隐私权剥削的重要工具。在本文中,(wearing)格拉斯(weing)格拉斯(we Claude) 分类分类器分类的分类与精确攻击策略保护了一种我们研究模型模型模型模型模型研究模型的模型研究方法,而我们则可以研究模型分析其他的理论模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型分析。