Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents CANDLE, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. CANDLE extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). CANDLE includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the CANDLE CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://cultural-csk.herokuapp.com/.
翻译:对许多AI应用来说,结构化知识十分重要。常识知识对强健的以人为中心的AI至关重要,它由为数不多的结构化知识项目所涵盖,但是它们缺乏关于以社会文化背景为条件的人类特征和行为的知识,而对于定位性AI则至关重要。本文介绍了一种规模化的从端到端的获取高质量文化常识的方法CANDLE。CANDLE从一个庞大的网络中提取CCSK的主张,并将其组织成一个连贯的集群,涉及三个学科领域(地理、宗教、职业)和若干文化方面(食物、饮料、服装、传统、仪式、行为)。CANDLE包括基于分类的过滤和评分有趣程度的明智技术。实验性评估显示CANDLE CCCK的收集优于先前的作品,而一个外部使用案例显示CCSK对GPT-3语言模型的好处。可在https://confulation-csk.herokokapp.com/查阅代码和数据。