UKP-SQuARE v2 对可信赖的质量保证的可解释性和对抗性攻击 (UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA)

Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model's prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de.

翻译：问题解答(QA)系统越来越多地用于支持现实世界决策的应用中。然而,最先进的模型依赖深层神经网络,而这种网络很难由人类解释。自然解释的模型或事后临时解释方法可以帮助用户理解模型是如何到达预测的,如果成功的话,可以增加他们对系统的信任。此外,研究人员可以利用这些洞察力来开发更准确和较少偏差的新方法。在本文件中,我们引入了SQARE v2,即新版的SQARE,以提供一个解释性基础设施,用以比较基于突出的地图和图表解释等方法的模型。虽然突出的地图有助于检查模型预测中每种输入符号的重要性,但外部知识图表的图表解释使用户能够核实模型预测背后的推理。此外,我们提供多种对抗性攻击来比较QA模型的稳健性。我们用这些解释性方法和对抗性攻击来方便对可信赖的QA模型的研究。SQARE在 https://sdeqequa.srequa. sur.