Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP. While CSKB completion only fills the missing links within the domain of the CSKB, CSKB population is alternatively proposed with the goal of reasoning unseen assertions from external resources. In this task, CSKBs are grounded to a large-scale eventuality (activity, state, and event) graph to discriminate whether novel triples from the eventuality graph are plausible or not. However, existing evaluations on the population task are either not accurate (automatic evaluation with randomly sampled negative examples) or of small scale (human annotation). In this paper, we benchmark the CSKB population task with a new large-scale dataset by first aligning four popular CSKBs, and then presenting a high-quality human-annotated evaluation set to probe neural models' commonsense reasoning ability. We also propose a novel inductive commonsense reasoning model that reasons over graphs. Experimental results show that generalizing commonsense reasoning on unseen assertions is inherently a hard task. Models achieving high accuracy during training perform poorly on the evaluation set, with a large gap between human performance. We will make the data publicly available for future contributions. Codes and data are available at https://github.com/HKUST-KnowComp/CSKB-Population.
翻译:以普通知识库(CSKB)为依据,其要素以自由文本形式出现的新三重知识库(CSKB)是国家劳工局的一项重要而艰巨的任务。虽然CSKB的完成只是填补了CSKB范围内的缺失环节,但提出CSKB的人口,目的是从外部资源推理无法预见的断言。在这项任务中,CSKB基于一个大规模事件(活动、状态和事件)图,以区分从事件性图表中得出的新三重元素是否合理。然而,现有的人口任务评价要么不准确(自动评价,随机抽样的负面例子),要么是小规模(人文注解),虽然CSKB的完成只是填补了CSKB的缺失环节,目的是为了从外部资源中推理出新的大规模数据集。在这个文件中,我们首先对四种受欢迎的 CSKBB进行基准,然后提出高质量的人文说明性评估,以探究神经模型的理论推理能力。我们还提出了一个新的普通理论推理模型,其理由超过图表。实验结果显示,在普通数据库中普遍推理的通用推理学/CSB的粗略性推理,在可获取的粗略数据中,在可获取的数据模型中将必然地进行。