We analyze state-of-the-art deep learning models for three tasks: question answering on (1) images, (2) tables, and (3) passages of text. Using the notion of \emph{attribution} (word importance), we find that these deep networks often ignore important question terms. Leveraging such behavior, we perturb questions to craft a variety of adversarial examples. Our strongest attacks drop the accuracy of a visual question answering model from $61.1\%$ to $19\%$, and that of a tabular question answering model from $33.5\%$ to $3.3\%$. Additionally, we show how attributions can strengthen attacks proposed by Jia and Liang (2017) on paragraph comprehension models. Our results demonstrate that attributions can augment standard measures of accuracy and empower investigation of model performance. When a model is accurate but for the wrong reasons, attributions can surface erroneous logic in the model that indicates inadequacies in the test data.
翻译:我们分析了三种任务的最新深层次学习模式:在文本的(1)图像、(2)表格和(3)段落上回答问题;使用 emph{adrigition} (字号重要性) 的概念,我们发现这些深层次网络往往忽略重要的问题术语;利用这种行为,我们纠缠问题来编造各种敌对的例子;我们最强烈的攻击使一个直观回答模式的准确性从61.1美元下降到19美元,一个表单问题回答模式的准确性从33.5美元下降到33.3美元;此外,我们展示了归因如何加强贾和梁对段落理解模型提出的攻击(2017年),我们的结果表明归因能够增强标准的准确度和模型性能调查。如果模型准确,但由于错误的原因,归因可以显示模型中的错误逻辑,表明测试数据存在缺陷。