In spite of the successful application in many fields, machine learning models today suffer from notorious problems like vulnerability to adversarial examples. Beyond falling into the cat-and-mouse game between adversarial attack and defense, this paper provides alternative perspective to consider adversarial example and explore whether we can exploit it in benign applications. We first attribute adversarial example to the human-model disparity on employing non-semantic features. While largely ignored in classical machine learning mechanisms, non-semantic feature enjoys three interesting characteristics as (1) exclusive to model, (2) critical to affect inference, and (3) utilizable as features. Inspired by this, we present brave new idea of benign adversarial attack to exploit adversarial examples for goodness in three directions: (1) adversarial Turing test, (2) rejecting malicious model application, and (3) adversarial data augmentation. Each direction is positioned with motivation elaboration, justification analysis and prototype applications to showcase its potential.
翻译:尽管在许多领域应用成功,但如今,机器学习模式仍面临臭名昭著的问题,如易受对抗性攻击和防御之间的对抗性攻击的弱点。除了陷入对抗性攻击和防御之间的猫咪和动作游戏之外,本文还提出了另一种观点来考虑对抗性例子,并探讨我们是否能够在良性应用中加以利用。我们首先将对抗性例子归因于人类模型在使用非世俗特征方面的差异。在传统机器学习机制中,非世俗特征基本上被忽视,但有三个有趣的特征:(1) 模型独家,(2) 影响推论的关键,(3) 特征可加以利用。在这一点的启发下,我们提出了新的大胆的良性对抗性攻击新理念,在三个方向上利用对抗性攻击性例子: (1) 对抗性图灵试验,(2) 拒绝恶意模型应用,(3) 增强性数据。每个方向都有动机的阐述、解释性分析和原型应用,以展示其潜力。