Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.
翻译:多点推理是针对不完整知识图表(KGs)进行答答(QA)的有效方法。问题可以在强化学习(RL)设置中提出,在这个设置中,以政策为基础的代理机构依次扩展其推论路径,直至达到目标。然而,在不完全的KG环境中,该代理机构因培训数据中的虚假否定而遭到低质量报酬的腐蚀,这在测试时会损害一般化。此外,由于没有将黄金动作序列用于培训,因此该代理机构可能会被偶然导致正确答案的虚假搜索轨迹误导。我们建议两个解决这两个问题的示范性进展:(1) 我们减少虚假监督的影响,采用预先训练的一次性嵌入模型来估计未观察到的事实的奖赏;(2) 我们反对对政策性RL的敏感度,迫使代理机构利用随机生成的边缘面具探索一套不同的路径。我们的方法大大改进了几个基准数据集上基于路径的KGQA模型,而且比嵌入模型可比或更好。