Attention Networks (ATNs) such as Transformers are used in many domains ranging from Natural Language Processing to Autonomous Driving. In this paper, we study the robustness problem of ATNs, a key characteristic where low robustness may cause safety concerns. Specifically, we focus on Sparsemax-based ATNs and reduce the finding of their maximum robustness to a Mixed Integer Quadratically Constrained Programming (MIQCP) problem. We also design two pre-processing heuristics that can be embedded in the MIQCP encoding and substantially accelerate its solving. We then conduct experiments using the application of Land Departure Warning to compare the robustness of Sparsemax-based ATNs against that of the more conventional Multi-Layer-Perceptron (MLP) Neural Networks (NNs). To our surprise, ATNs are not necessarily more robust, leading to profound considerations in selecting appropriate NN architectures for safety-critical domain applications.
翻译:从自然语言处理到自主驾驶等许多领域都使用变压器等关注网络。在本文中,我们研究了ATN的稳健性问题,这是低稳健性可能引起安全关切的一个关键特征。具体地说,我们侧重于基于松散的ATN,并减少对混合整形四重控制编程问题的最大稳健性发现。我们还设计了两种预处理超常性,可以嵌入MIQCP编码,并大大加快其解决速度。然后,我们利用 " 出入境警告 " 进行实验,将基于Sparsemax的ATN的稳健性与较传统的多射管神经网络(NNS)的稳健性进行比较。令我们惊讶的是,ATN不一定更加稳健,导致在选择适当的NN结构用于安全关键域应用程序时进行深刻的考虑。