In this paper, we study the following problem: given a knowledge graph (KG) and a set of input vertices (representing concepts or entities) and edge labels, we aim to find the smallest connected subgraphs containing all of the inputs. This problem plays a key role in KG-based search engines and natural language question answering systems, and it is a natural extension of the Steiner tree problem, which is known to be NP-hard. We present RECON, a system for finding approximate answers. RECON aims at achieving high accuracy with instantaneous response (i.e., sub-second/millisecond delay) over KGs with hundreds of millions edges without resorting to expensive computational resources. Furthermore, when no answer exists due to disconnection between concepts and entities, RECON refines the input to a semantically similar one based on the ontology, and attempt to find answers with respect to the refined input. We conduct a comprehensive experimental evaluation of RECON. In particular we compare it with five existing approaches for finding approximate Steiner trees. Our experiments on four large real and synthetic KGs show that RECON significantly outperforms its competitors and incurs a much smaller memory footprint.
翻译:在本文中,我们研究以下问题:根据知识图(KG)和一套输入顶点(代表概念或实体)和边缘标签,我们的目标是找到包含所有投入的最小连接子集。这个问题在基于KG的搜索引擎和自然语言问答系统中起着关键作用,这是斯坦纳林问题的一个自然延伸,众所周知,这是NP-硬的。我们介绍了一个寻找近似答案的系统RECON。RECON的目标是在不使用昂贵计算资源的情况下,在具有数以亿计边缘的KG上实现高精确度(即二/二分/二秒延迟)。此外,当由于概念和实体之间脱节而没有答案时,RECON根据肿瘤学改进了对一个类似输入的语系的输入,并试图找到关于改良输入的答案。我们对RECON进行了全面的实验性评估。我们特别将它与现有的五种方法进行了比较,以找到近似 Steiner树。我们在四大真实和合成KG的实验显示RECON大大超出其记忆性,并且使竞争者面临一个小得多的内径。