Neural networks have been widely applied in security applications such as spam and phishing detection, intrusion prevention, and malware detection. This black-box method, however, often has uncertainty and poor explainability in applications. Furthermore, neural networks themselves are often vulnerable to adversarial attacks. For those reasons, there is a high demand for trustworthy and rigorous methods to verify the robustness of neural network models. Adversarial robustness, which concerns the reliability of a neural network when dealing with maliciously manipulated inputs, is one of the hottest topics in security and machine learning. In this work, we survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies. We also demonstrate representative techniques that have been applied in existing studies with a sample model. Finally, we discuss open questions for future research.
翻译:神经网络被广泛应用于安全应用,如垃圾邮件和钓鱼检测、入侵预防和恶意软件检测等。但是,这种黑盒子方法在应用中往往具有不确定性和解释性差。此外,神经网络本身往往容易受到对抗性攻击。由于这些原因,对可信和严格的方法的高度需求,以核实神经网络模型的稳健性。关于神经网络在处理恶意操纵的投入时的可靠性的抗逆性强性是安全和机器学习中最热门的主题之一。在这项工作中,我们调查神经网络对抗性稳健性核查中的现有文献,并收集机器学习、安全和软件工程领域的39种多样化研究作品。我们系统地分析它们的方法,包括制定稳健性、使用何种核查技术以及每种技术的长处和局限性。我们从正式核查的角度为全面了解这个专题提供了一种分类方法。我们根据财产规格、问题减少和推理策略对现有的技术进行了分类。我们还展示了在现有的研究中应用的具有代表性的技术,我们用样本模型来讨论未来研究的公开问题。