Knowledge graph (KG) completion is a well-studied problem in AI. Rule-based methods and embedding-based methods form two of the solution techniques. Rule-based methods learn first-order logic rules that capture existing facts in an input graph and then use these rules for reasoning about missing facts. A major drawback of such methods is the lack of scalability to large datasets. In this paper, we present a simple linear programming (LP) model to choose rules from a list of candidate rules and assign weights to them. For smaller KGs, we use simple heuristics to create the candidate list. For larger KGs, we start with a small initial candidate list, and then use standard column generation ideas to add more rules in order to improve the LP model objective value. To foster interpretability and generalizability, we limit the complexity of the set of chosen rules via explicit constraints, and tune the complexity hyperparameter for individual datasets. We show that our method can obtain state-of-the-art results for three out of four widely used KG datasets, while taking significantly less computing time than other popular rule learners including some based on neuro-symbolic methods. The improved scalability of our method allows us to tackle large datasets such as YAGO3-10.
翻译:在 AI 中, 知识图( KG) 完成是一个研究周密的问题 。 基于规则的方法和嵌入法的方法形成两种解决方案技术。 基于规则的方法学习一阶逻辑规则, 在输入图中捕捉现有事实, 然后使用这些规则来推理缺失的事实。 这种方法的一大缺点是无法对大型数据集进行缩放。 在本文中, 我们提出了一个简单的线性编程模式, 从候选规则列表中选择规则, 并给它们分配权重。 对于较小的 KG 来说, 我们使用简单的超常方法来创建候选人名单。 对于更大的 KG 来说, 我们先从一个小的初始候选者名单开始, 然后使用标准列生成概念来添加更多的规则, 以便改进 LP 模型客观价值。 为了提高可解释性和可概括性, 我们通过明确的制约来限制所选规则集的复杂性, 并调整单个数据集的复杂度。 我们显示, 我们的方法可以从四种广泛使用的 KG 数据集中获取最新的结果 。 而对于四种广泛使用的数据集来说, 我们首先从一个小的模型中开始, 我们使用一个小得多地计算时间, 然后用一个标准生成更多的立体生成模型的想法来增加规则的模型, 然后用我们的神经- 方法可以让我们的神经- 。