Acquiring commonsense knowledge and reasoning is recognized as an important frontier in achieving general Artificial Intelligence (AI). Recent research in the Natural Language Processing (NLP) community has demonstrated significant progress in this problem setting. Despite this progress, which is mainly on multiple-choice question answering tasks in limited settings, there is still a lack of understanding (especially at scale) of the nature of commonsense knowledge itself. In this paper, we propose and conduct a systematic study to enable a deeper understanding of commonsense knowledge by doing an empirical and structural analysis of the ConceptNet knowledge base. ConceptNet is a freely available knowledge base containing millions of commonsense assertions presented in natural language. Detailed experimental results on three carefully designed research questions, using state-of-the-art unsupervised graph representation learning ('embedding') and clustering techniques, reveal deep substructures in ConceptNet relations, allowing us to make data-driven and computational claims about the meaning of phenomena such as 'context' that are traditionally discussed only in qualitative terms. Furthermore, our methodology provides a case study in how to use data-science and computational methodologies for understanding the nature of an everyday (yet complex) psychological phenomenon that is an essential feature of human intelligence.
翻译:获得常识知识和推理被认为是实现一般人工智能(AI)的重要前沿。最近对自然语言处理(NLP)社区的研究表明,在这一问题背景下取得了显著进展。尽管取得了这一进展,主要是在有限环境下的多选择问题回答任务,但对于常识知识本身的性质仍然缺乏了解(特别是规模),在本文件中,我们提议并进行系统研究,以便通过对概念网知识库进行经验分析和结构分析,加深对常识的了解。概念网是一个可自由获取的知识库,包含以自然语言提供的数百万种常识断言。三个精心设计的研究问题的详细实验结果,使用最新而不受监督的图形代表学习(“编造”)和组合技术,揭示概念网关系中的深层次结构,使我们能够对传统上仅用定性术语讨论的“理论”等现象的含义进行数据驱动和计算性索赔。此外,我们的方法提供了如何使用数据科学和计算方法进行案例研究,以了解人类智能特征的复杂特征(即日常心理特征)的分析和计算方法。