We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content, and is a more difficult link prediction benchmark. Data, code, and pretrained models are available at https://bit.ly/2EPbrJs.
翻译:我们介绍了一套从维基数据和维基百科提取的知识图表完成数据集CoDEx,这是一套从维基数据和维基百科中提取的知识图表完成数据集,它改进了现有知识图表完成基准的范围和难度。在范围上,CoDEx由三个在规模和结构上各不相同的知识图表组成,对实体和关系的多语种描述,以及数以万计的硬负三重数据,这些数据是可信的,但经核实是虚假的。为了给 CoDEx定性,我们提供了透彻的经验分析和基准实验。首先,我们从逻辑关系模式的角度分析了每个CoDEx数据集。接下来,我们报告了五个广泛调整的嵌入模型的CoDEx的基线链接预测和三重分类结果。最后,我们将CoDEx与流行的FB15K-237知识图表完成数据集区分开来,显示CoDEx包含更多样化和可解释的内容,并且是一个更困难的链接基准。数据、代码和预先培训的模型可在https://bitly/2EPbrJs查阅。