Despite the efficiency and scalability of machine learning systems, recent studies have demonstrated that many classification methods, especially deep neural networks (DNNs), are vulnerable to adversarial examples; i.e., examples that are carefully crafted to fool a well-trained classification model while being indistinguishable from natural data to human. This makes it potentially unsafe to apply DNNs or related methods in security-critical areas. Since this issue was first identified by Biggio et al. (2013) and Szegedy et al.(2014), much work has been done in this field, including the development of attack methods to generate adversarial examples and the construction of defense techniques to guard against such examples. This paper aims to introduce this topic and its latest developments to the statistical community, primarily focusing on the generation and guarding of adversarial examples. Computing codes (in python and R) used in the numerical experiments are publicly available for readers to explore the surveyed methods. It is the hope of the authors that this paper will encourage more statisticians to work on this important and exciting field of generating and defending against adversarial examples.
翻译:尽管机器学习系统的效率和可扩展性很高,但最近的研究表明,许多分类方法,特别是深神经网络(DNNs),都容易受到对抗性实例的影响;例如,一些精心设计的例子,以愚弄训练有素的分类模式,同时又与自然数据与人类无法区分;这使得在安全关键领域应用DNS或相关方法可能不安全;自Biggio等人(2013年)和Szegedy等人(2014年)首次确定这一问题以来,在这一领域做了大量工作,包括开发攻击方法,以产生对抗性实例,并构建防御技术,以防范此类例子;本文件旨在向统计界介绍这一专题及其最新发展情况,主要侧重于生成和维护对抗性实例;在数字实验中使用的计算机代码(python和R)可供读者公开查阅,以探讨调查方法;作者希望本文件将鼓励更多的统计人员在这一重要和激动人心的领域开展工作,生成和防御对抗性实例。