Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.
翻译:关于知识基础回答问题的现有研究(KBQA)主要以标准(即假设)运作,即对问题的培训分布与测试分布相同,然而,在大规模KBs上,可能既不合理可行,也不可取,因为:(1) 真正的用户分布很难捕捉,(2) 巨大空间随机抽样培训实例将数据效率极低。相反,我们建议KBQA模型应具有三个层次的内在概括性:一.d、构成和零弹射。为了以更强有力的概括化方式促进KBQA模型的开发,我们建造和发行了一套新的大规模高质量数据,其中含有64,331个问题, GrailQA, 并为所有三个层次的总体化提供了评价环境。此外,我们提出了一个新的基于BERT的KBQA模型。我们的数据组合和模型的组合使我们能够首次彻底地检查和展示了诸如BERT等经过事先培训的环境嵌入库的关键作用。