Despite being vast repositories of factual information, cross-domain knowledge graphs, such as Wikidata and the Google Knowledge Graph, only sparsely provide short synoptic descriptions for entities. Such descriptions that briefly identify the most discernible features of an entity provide readers with a near-instantaneous understanding of what kind of entity they are being presented. They can also aid in tasks such as named entity disambiguation, ontological type determination, and answering entity queries. Given the rapidly increasing numbers of entities in knowledge graphs, a fully automated synthesis of succinct textual descriptions from underlying factual information is essential. To this end, we propose a novel fact-to-sequence encoder-decoder model with a suitable copy mechanism to generate concise and precise textual descriptions of entities. In an in-depth evaluation, we demonstrate that our method significantly outperforms state-of-the-art alternatives.
翻译:尽管存在大量的事实信息库,但维基数据和谷歌知识图等跨域知识图却很少为实体提供简短的概括性描述,这些描述简要地指出一个实体最明显的特点,使读者能够近乎即时地了解它们所介绍的实体的种类,也有助于完成诸如名称实体脱钩、肿瘤类型确定和答复实体询问等任务。鉴于知识图中实体数目迅速增加,必须完全自动化地综合基本事实信息中的简明文字描述。为此,我们提议采用新的从事实到序列的编码-解码模型模型,并配有适当的复制机制,以产生对实体的简明和准确的文字描述。在一项深入的评估中,我们证明我们的方法大大地超越了最先进的替代方法。