Populating Commonsense Knowledge Bases (CSKB) is an important yet hard task in NLP, as it tackles knowledge from external sources with unseen events and entities. Fang et al. (2021a) proposed a CSKB Population benchmark with an evaluation set CKBP v1. However, CKBP v1 adopts crowdsourced annotations that suffer from a substantial fraction of incorrect answers, and the evaluation set is not well-aligned with the external knowledge source as a result of random sampling. In this paper, we introduce CKBP v2, a new high-quality CSKB Population benchmark, which addresses the two mentioned problems by using experts instead of crowd-sourced annotation and by adding diversified adversarial samples to make the evaluation set more representative. We conduct extensive experiments comparing state-of-the-art methods for CSKB Population on the new evaluation set for future research comparisons. Empirical results show that the population task is still challenging, even for large language models (LLM) such as ChatGPT. Codes and data are available at https://github.com/HKUST-KnowComp/CSKB-Population.
翻译:填充通用常识知识库(CSKB)是NLP中一个有意义的却具有挑战性的任务,因为它管理着来自未见过的事件和实体的外部知识源。Fang等人(2021a)提出了一个 CSKB填充基准的评估集CKBP v1。然而,CKBP v1采用了众包标注,存在相当一部分标注错误,并且评估集对外部知识源的结果没有对齐,因为是随机抽样的。在本文中,我们引入了CKBP v2,一个全新的高质量CSKB填充基准,通过使用专家而非众包标注来解决上述两个问题,并通过添加多样化的对抗性样本使评估集更具代表性。我们对新的评估集上的CSKB填充的现有方法进行了广泛的实验研究,以便进行未来的研究比较。实证结果表明,尽管采用了像ChatGPT这样的大型语言模型(LLM),但填充任务仍然具有挑战性。 代码和数据可在 https://github.com/HKUST-KnowComp/CSKB-Population 处找到。