Commonsense knowledge (CSK) about concepts and their properties is helpful for AI applications. Prior works, such as ConceptNet, have compiled large CSK collections. However, they are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O. This paper presents a method called ASCENT++ to automatically build a large-scale knowledge base (KB) of CSK assertions, with refined expressiveness and both better precision and recall than prior works. ASCENT++ goes beyond SPO triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter is essential to express the temporal and spatial validity of assertions and further qualifiers. Furthermore, ASCENT++ combines open information extraction (OpenIE) with judicious cleaning and ranking by typicality and saliency scores. For high coverage, our method taps into the large-scale crawl C4 with broad web contents. The evaluation with human judgments shows the superior quality of the ASCENT++ KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of ASCENT++. A web interface, data, and code can be accessed at https://ascentpp.mpi-inf.mpg.de/.
翻译:关于概念及其特性的常识知识(CSK)对于AI 应用很有帮助。以前的工作,例如概念网,已经编集了大型CSK收藏,但是,这些收藏在表达性上受到限制,限于主题预测对象(SPO)三重,具有S和P和O字符串的简单概念。本文介绍了一种名为ASCENT++的方法,以自动建立CSK的大规模知识库(KB),其表达性比以前的工作更精细,其表达性更精确和回顾性强。ASCENT++(SPO)已经超越了SPO的三倍,通过分组和方方面收集综合概念,并用语义部分改进了说法。后者对于表达声明的时间和空间有效性以及进一步的限定性至关重要。此外,ASC++(OSTIE)将开放的信息提取(OSTIE)与典型和突出分级的明智的清洁和排序结合起来。关于高覆盖率,我们的方法将大量内容带宽广的网络内容的大型爬 C4。与人类的判断显示ASC+K的优等质量,以及用于界面的扩展评价,A/Sentexxxxxxxxxx/s。