Neural networks are known for their ability to detect general patterns in noisy data. This makes them a popular tool for perception components in complex AI systems. Paradoxically, they are also known for being vulnerable to adversarial attacks. In response, various methods such as adversarial training, data-augmentation and Lipschitz robustness training have been proposed as means of improving their robustness. However, as this paper explores, these training methods each optimise for a different definition of robustness. We perform an in-depth comparison of these different definitions, including their relationship, assumptions, interpretability and verifiability after training. We also look at constraint-driven training, a general approach designed to encode arbitrary constraints, and show that not all of these definitions are directly encodable. Finally we perform experiments to compare the applicability and efficacy of the training methods at ensuring the network obeys these different definitions. These results highlight that even the encoding of such a simple piece of knowledge such as robustness in neural network training is fraught with difficult choices and pitfalls.
翻译:众所周知,神经网络能够探测噪音数据的一般模式。这使得它们成为复杂的人工智能系统中感知组成部分的流行工具。自相矛盾的是,它们也以易受对抗性攻击而闻名。作为回应,提出了各种方法,如对抗性培训、数据增强和利普西茨强力培训等,作为提高网络稳健性的手段。然而,正如本文件所探讨的那样,这些培训方法为不同强性定义提供了最佳选择。我们对这些不同的定义进行了深入的比较,包括它们的关系、假设、可解释性和培训后的可核查性。我们还研究了制约性培训,这是一种旨在编码任意限制的一般方法,表明并非所有这些定义都可直接编码。最后,我们进行了实验,以比较培训方法在确保网络遵守这些不同定义方面的适用性和有效性。这些结果突出表明,即使是神经网络培训的强健性等简单知识的编码也充满了困难的选择和陷阱。