Adversarial images are created with the intention of causing an image classifier to produce a misclassification. In this paper, we propose that adversarial images should be evaluated based on semantic mismatch, rather than label mismatch, as used in current work. In other words, we propose that an image of a "mug" would be considered adversarial if classified as "turnip", but not as "cup", as current systems would assume. Our novel idea of taking semantic misclassification into account in the evaluation of adversarial images offers two benefits. First, it is a more realistic conceptualization of what makes an image adversarial, which is important in order to fully understand the implications of adversarial images for security and privacy. Second, it makes it possible to evaluate the transferability of adversarial images to a real-world classifier, without requiring the classifier's label set to have been available during the creation of the images. The paper carries out an evaluation of a transfer attack on a real-world image classifier that is made possible by our semantic misclassification approach. The attack reveals patterns in the semantics of adversarial misclassifications that could not be investigated using conventional label mismatch.
翻译:在本文中,我们建议根据语义错配而不是标签错配来评价对抗性图像。 换句话说,我们建议,如果将“ 翻转” 图像归类为“ 翻转 ”,那么“ 翻转 ” 图像将被视为对抗性图像,而不是像当前系统假设的那样“ 翻转 ” 。 我们的新颖想法是,在评价对抗性图像时,将语义错误分类考虑在内,这有两个好处。 首先,这是对对抗性图像的比较现实的概念化,这对于充分了解对抗性图像对安全和隐私的影响非常重要。 其次,我们建议,如果将对抗性图像归类为“ 翻转”, 而不是像目前的系统假设那样“ 翻转 ”, 则将被视为对抗性图像的可转移性图像。 本文对真实性图像分类的转移攻击进行了评估,这是我们的语义错误分类方法所促成的。 攻击揭示了对抗性错配错配的语义学模式, 使用常规的标签是无法调查的。