While the similarity between two concept words has been evaluated and studied for decades, much less attention has been devoted to algorithms that can compute the similarity of nodes in very large knowledge graphs, like Wikidata. To facilitate investigations and head-to-head comparisons of similarity algorithms on Wikidata, we present a user-friendly interface that allows flexible computation of similarity between Qnodes in Wikidata. At present, the similarity interface supports four algorithms, based on: graph embeddings (TransE, ComplEx), text embeddings (BERT), and class-based similarity. We demonstrate the behavior of the algorithms on representative examples about semantically similar, related, and entirely unrelated entity pairs. To support anticipated applications that require efficient similarity computations, like entity linking and recommendation, we also provide a REST API that can compute most similar neighbors for any Qnode in Wikidata.
翻译:虽然几十年来对两个概念词的相似性进行了评估和研究,但对能够计算大量知识图中结点的相似性的算法,如维基数据,重视的却少得多。为了便于调查和对维基数据中的相似性算法进行头对头比较,我们提出了一个方便用户的界面,可以灵活计算维基数据中的Qnodes之间的相似性。目前,相似性接口支持四种算法,其基础是:图形嵌入(TransE,ComplEx),文本嵌入(BERT)和基于阶级的相似性。我们用关于语义相似、相关和完全无关的实体对子的代表性实例来展示算法的行为。为了支持预期的、需要高效相似性计算(如实体连接和建议)的应用程序,我们还提供了一种可以计算维基数据中任何Qnode的最相似邻居的REST API。