Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique is to apply named entity disambiguation (NED) systems to the question, and retrieve KB facts for the disambiguated entities. This work presents ECQA, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. ECQA is based on top-k query processing over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks demonstrate the superiority of ECQA over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.
翻译:回答有关知识基础(KB-QA)的复杂问题时,面临数十亿个事实的大量输入数据,涉及数百万实体和数千个上游。为了效率,QA系统首先通过确定可能包含所有答案和相关线索的一系列事实来减少答案搜索空间。最常用的方法是将名称实体的模糊(NED)系统应用于问题,并为矛盾的实体检索KB事实。这项工作提供了ECQA,这是一种利用KB-aware信号处理搜索空间中无关部分的有效方法。ECQA基于对按分顺序排列的KB项目列表的顶级查询处理,该列表将有关词汇匹配、与问题的相关性、候选项目的一致性和KB图中的连接性等信号结合起来。最近两个质量A基准的实验表明ECQA在回答存在、搜索空间大小和运行时间方面优于最新水平基线。