Numerical Question Answering is the task of answering questions that require numerical capabilities. Previous works introduce general adversarial attacks to Numerical Question Answering, while not systematically exploring numerical capabilities specific to the topic. In this paper, we propose to conduct numerical capability diagnosis on a series of Numerical Question Answering systems and datasets. A series of numerical capabilities are highlighted, and corresponding dataset perturbations are designed. Empirical results indicate that existing systems are severely challenged by these perturbations. E.g., Graph2Tree experienced a 53.83% absolute accuracy drop against the ``Extra'' perturbation on ASDiv-a, and BART experienced 13.80% accuracy drop against the ``Language'' perturbation on the numerical subset of DROP. As a counteracting approach, we also investigate the effectiveness of applying perturbations as data augmentation to relieve systems' lack of robust numerical capabilities. With experiment analysis and empirical studies, it is demonstrated that Numerical Question Answering with robust numerical capabilities is still to a large extent an open question. We discuss future directions of Numerical Question Answering and summarize guidelines on future dataset collection and system design.
翻译:数字问题解答是回答需要数字能力的问题的任务。 先前的作品对数字问题解答引入了一般对抗性攻击,但没有系统地探索专题特有的数字能力。 在本文中,我们提议对一系列数字问题解答系统和数据集进行数字能力分析。 突出一系列数字能力,并设计相应的数据集扰动。 经验性结果显示,现有系统受到这些扰动的严重挑战。 例如,图2TRree在“ Extra's' perturbedition on ASDiv-a” 的“ Extra' perbilbilation” 中经历了53.83%的绝对准确性下降,而BART在“ Language' perturbeting on the nual subility subility system” 中则出现了13. 80%的准确性下降。 作为反击方法,我们还调查将扰动数据扩增数据以缓解系统缺乏稳健的数字能力的情况。 通过实验性分析和经验性研究, 显示, 具有强大数字能力的数值能力的数值解答的数值问题绝对精确度回答仍然是一个很大的问题。 我们讨论未来设计方向。