Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge. To deal with the increasing volume of new concepts needed to be organized as taxonomies, researchers turn to automatically completion of an existing taxonomy with new concepts. In this paper, we propose TaxoEnrich, a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy and offers a better representation of candidate position to boost the performance of taxonomy completion. Specifically, TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy; (3) a query-aware sibling encoder which adaptively aggregates candidate siblings to augment candidate position representations based on their importance to the query-position matching; (4) a query-position matching model which extends existing work with our new candidate position representations. Extensive experiments on four large real-world datasets from different domains show that \TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art methods by a large margin.
翻译:在许多领域,分类是许多现实世界应用的基础,可以作为知识的结构性表现。为了处理越来越多的新概念需要作为分类学组织起来的新概念数量,研究人员转向自动完成现有具有新概念的分类学。在本文件中,我们提议了CaxoEnrich,这是一个新的分类学完成框架,它有效地利用现有分类学的结构信息,利用现有分类学的语义特征和结构信息,并提供一个更好的候选人职位代表,以提高完成分类学的绩效。具体地说,CaxoEnrich由四个组成部分组成:(1) 分类学 -- -- 通俗化嵌入,其中既包括概念的语义含义,也包括基于强势预先训练的语言模式的分类学关系;(2) 分类学 -- -- 有觉识的顺序编码,通过对分类学的结构信息进行汇编,来学习候选方的立场;(3) 查询 -- -- 有适应性地将候选人的兄弟姐妹集合起来,以便根据他们对于查询定位的比对匹配的重要性,增加候选人的职位表现。(4) 查询定位模型将现有工作与我们新的候选职位代表的语义含义相匹配。