When answering natural language questions over knowledge bases (KBs), incompleteness in the KB can naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We first identify various forms of KB incompleteness that can result in a question being unanswerable. We then propose GrailQAbility, a new benchmark dataset, which systematically modifies GrailQA (a popular KBQA dataset) to represent all these incompleteness issues. Testing two state-of-the-art KBQA models (trained on original GrailQA as well as our GrailQAbility), we find that both models struggle to detect unanswerable questions, or sometimes detect them for the wrong reasons. Consequently, both models suffer significant loss in performance, underscoring the need for further research in making KBQA systems robust to unanswerability.
翻译:在回答知识基础(KBs)上的自然语言问题时,KB的不完全性自然会导致许多问题无法回答。虽然在其他QA设置中探索了可回答性,但在知识基础(KBQA)上尚未对QA进行可回答性研究。我们首先确定各种形式的KB不完全性,可能导致问题无法回答。我们然后提议建立一个新的基准数据集,即圣杯QA(一个广受欢迎的KBQA数据集),以系统地修改圣杯QA(一个广受欢迎的KBQA数据集),以代表所有这些不完全性问题。测试两个最先进的KBQA模型(在原始圣杯QA和我们的圣杯QA能力方面受过训练),我们发现两个模型都在努力探测无法回答的问题,有时为了错误的原因探测这些问题。因此,两个模型在性能方面蒙受了重大损失,这突出表明需要进一步研究,使KBQA系统变得坚固,无法回答。