The goal of entity matching in knowledge graphs is to identify entities that refer to the same real-world objects using some similarity metric. The result of entity matching can be seen as a set of entity pairs interpreted as the same-as relation. However, the identified set of pairs may fail to satisfy some structural properties, in particular transitivity, that are expected from the same-as relation. In this work, we show that an ad-hoc enforcement of transitivity, i.e. taking the transitive closure, on the identified set of entity pairs may decrease precision dramatically. We therefore propose a methodology that starts with a given similarity measure, generates a set of entity pairs that are identified as referring to the same real-world objects, and applies the cluster editing algorithm to enforce transitivity without adding many spurious links, leading to overall improved performance.
翻译:在知识图表中相匹配的实体的目标是,使用某种相似度度度来识别指同一种真实世界物体的实体。实体相匹配的结果可以视为一组实体对口,被解释为同一种关系。然而,所查明的对对口可能无法满足某些结构性属性,特别是预期同一种关系中预期具有的过渡性。在这项工作中,我们表明,对已查明的实体对口实施临时过渡性,即对已查明的实体对口实行过渡性关闭,可能会大幅降低精确度。因此,我们提出一种方法,从某种特定相似度措施开始,产生一套被确定为指同一种真实世界对象的实体对口,并采用集群编辑算法来强制过境性,而不增加许多虚假联系,从而全面提高性能。