反对知识图错误探测 (Contrastive Knowledge Graph Error Detection)

Knowledge Graph (KG) errors introduce non-negligible noise, severely affecting KG-related downstream tasks. Detecting errors in KGs is challenging since the patterns of errors are unknown and diverse, while ground-truth labels are rare or even unavailable. A traditional solution is to construct logical rules to verify triples, but it is not generalizable since different KGs have distinct rules with domain knowledge involved. Recent studies focus on designing tailored detectors or ranking triples based on KG embedding loss. However, they all rely on negative samples for training, which are generated by randomly replacing the head or tail entity of existing triples. Such a negative sampling strategy is not enough for prototyping practical KG errors, e.g., (Bruce_Lee, place_of_birth, China), in which the three elements are often relevant, although mismatched. We desire a more effective unsupervised learning mechanism tailored for KG error detection. To this end, we propose a novel framework - ContrAstive knowledge Graph Error Detection (CAGED). It introduces contrastive learning into KG learning and provides a novel way of modeling KG. Instead of following the traditional setting, i.e., considering entities as nodes and relations as semantic edges, CAGED augments a KG into different hyper-views, by regarding each relational triple as a node. After joint training with KG embedding and contrastive learning loss, CAGED assesses the trustworthiness of each triple based on two learning signals, i.e., the consistency of triple representations across multi-views and the self-consistency within the triple. Extensive experiments on three real-world KGs show that CAGED outperforms state-of-the-art methods in KG error detection. Our codes and datasets are available at https://github.com/Qing145/CAGED.git.

翻译：知识图( KG) 错误引入了不可忽略的噪音, 严重影响 KG 相关的下游任务。在 KG 中检测错误具有挑战性, 因为错误模式未知且多样, 而地面真相标签很少甚至不存在。传统的解决方案是构建逻辑规则以核查三重, 但无法普遍化, 因为不同的 KG 具有与域知识相关的不同规则。最近的研究侧重于设计基于 KG 嵌入损失的定制探测器或排行三级。但是, 它们都依赖于负面的训练样本, 由随机替换现有三重的顶部或尾部实体生成。这种负面的取样策略不足以让原版的 KG 错误发生, 例如, (Bruce_ Lee, 出生地_ birth_ birth, 中国) 。三个元素虽然不相匹配, 但通常具有相关性。我们希望一个为 KG 嵌入错误检测而定制的更高效的不超超级的学习机制。为此, 我们提议一个新型的框架 - ContraAstive 了解错误检测 ( CAG) 。它在 KG 学习的对比后, 的 CG 的底和直径端的自我评估中, 的自我评估,, 将展示的的的的的的的展示的的以的的向内展示的展示的的的和的的的的的的的的的将向内演示的展示的的向向向向向向中。