Similarities between entities occur frequently in many real-world scenarios. For over a century, researchers in different fields have proposed a range of approaches to measure the similarity between entities. More recently, inspired by "Google Sets", significant academic and commercial efforts have been devoted to expanding a given set of entities with similar ones. As a result, existing approaches nowadays are able to take into account properties shared by entities, hereinafter called nexus of similarity. Accordingly, machines are largely able to deal with both similarity measures and set expansions. To the best of our knowledge, however, there is no way to characterize nexus of similarity between entities, namely identifying such nexus in a formal and comprehensive way so that they are both machine- and human-readable; moreover, there is a lack of consensus on evaluating existing approaches for weakly similar entities. As a first step towards filling these gaps, we aim to complement existing literature by developing a novel logic-based framework to formally and automatically characterize nexus of similarity between tuples of entities within a knowledge base. Furthermore, we analyze computational complexity aspects of this framework.
翻译:实体间的相似性在许多实际场景中经常出现。一个世纪以来,不同领域的研究人员提出了各种方法来衡量实体之间的相似性。最近,受到“Google Sets”的启示,学术和商业界都致力于将给定的实体集扩展到相似实体。因此,现有方法能够考虑实体共享的特征,即所谓的相似性关系。因此,机器能够处理相似度度量和集合扩展。然而,据我们所知,还没有一种方法来正式而全面地表征实体之间的相似性关系,即以机器和人都可读的方式识别这种关系。此外,对于弱相似实体的现有方法评估也缺乏共识。作为填补这些空白的第一步,我们旨在开发一种新的基于逻辑的框架,以正式自动地特征化知识库中元组之间的相似性关系。此外,我们还分析了这种框架的计算复杂性方面。