One of the strongest signals for automated matching of knowledge graphs and ontologies are textual concept descriptions. With the rise of transformer-based language models, text comparison based on meaning (rather than lexical features) is available to researchers. However, performing pairwise comparisons of all textual descriptions of concepts in two knowledge graphs is expensive and scales quadratically (or even worse if concepts have more than one description). To overcome this problem, we follow a two-step approach: we first generate matching candidates using a pre-trained sentence transformer (so called bi-encoder). In a second step, we use fine-tuned transformer cross-encoders to generate the best candidates. We evaluate our approach on multiple datasets and show that it is feasible and produces competitive results.
翻译:自动匹配知识图表和本体学的最强烈信号之一是文字概念描述。随着以变压器为基础的语言模型的兴起,研究人员可以使用基于含义(而非词汇特征)的文本比较。然而,对两种知识图表中所有概念的文本描述进行双向比较,费用昂贵,而且按二次比较(如果概念的描述不止一个,则情况更糟 ) 。为了解决这一问题,我们采取了两步方法:我们首先使用预先训练的变压器(即所谓的双电解码器)来生成匹配的候选人。第二步是,我们使用精细调整的变压器交叉编码器来生成最佳候选人。我们评估了我们关于多个数据集的方法,并表明它可行并产生竞争性结果。