The number of Knowledge Graphs (KGs) generated with automatic and manual approaches is constantly growing. For an integrated view and usage, an alignment between these KGs is necessary on the schema as well as instance level. While there are approaches that try to tackle this multi source knowledge graph matching problem, large gold standards are missing to evaluate their effectiveness and scalability. We close this gap by presenting Gollum -- a gold standard for large-scale multi source knowledge graph matching with over 275,000 correspondences between 4,149 different KGs. They originate from knowledge graphs derived by applying the DBpedia extraction framework to a large wiki farm. Three variations of the gold standard are made available: (1) a version with all correspondences for evaluating unsupervised matching approaches, and two versions for evaluating supervised matching: (2) one where each KG is contained both in the train and test set, and (3) one where each KG is exclusively contained in the train or the test set.
翻译:以自动和手工方式生成的知识图(KGs)数量在不断增加。对于综合视图和使用来说,这些KGs之间在形式和实例层面都有必要保持一致。虽然有些方法试图解决多源知识图匹配问题,但缺少大金标准来评估其有效性和可缩放性。我们通过展示Gollum(一个大型多源知识图的金标准,与4,149个不同KGs之间的275 000个通信相匹配)来弥补这一差距。这些图来自通过将DBpedia提取框架应用到大型维基农场而得出的知识图。金标准的三种变式是:(1) 用于评价未经监督的匹配方法的所有信函的版本,以及两个用于评价监督匹配的版本:(2) 每个KG都包含在火车和测试集中的版本,以及(3) 每个KG完全包含在火车或测试集中的版本。