Symbolic knowledge graphs (KGs) have been constructed either by expensive human crowdsourcing or with domain-specific complex information extraction pipelines. The emerging large pretrained language models (LMs), such as Bert, have shown to implicitly encode massive knowledge which can be queried with properly designed prompts. However, compared to the explicit KGs, the implict knowledge in the black-box LMs is often difficult to access or edit and lacks explainability. In this work, we aim at harvesting symbolic KGs from the LMs, a new framework for automatic KG construction empowered by the neural LMs' flexibility and scalability. Compared to prior works that often rely on large human annotated data or existing massive KGs, our approach requires only the minimal definition of relations as inputs, and hence is suitable for extracting knowledge of rich new relations not available before.The approach automatically generates diverse prompts, and performs efficient knowledge search within a given LM for consistent and extensive outputs. The harvested knowledge with our approach is substantially more accurate than with previous methods, as shown in both automatic and human evaluation. As a result, we derive from diverse LMs a family of new KGs (e.g., BertNet and RoBERTaNet) that contain a richer set of commonsense relations, including complex ones (e.g., "A is capable of but not good at B"), than the human-annotated KGs (e.g., ConceptNet). Besides, the resulting KGs also serve as a vehicle to interpret the respective source LMs, leading to new insights into the varying knowledge capability of different LMs.
翻译:在这项工作中,我们的目标是从LMS中采集象征性的KG,这是一个通过神经LM的灵活性和可缩放性增强的自动KG建设的新框架。与以前常常依赖大量人类附加说明数据或现有大规模KGs的工程相比,我们的方法只要求以最起码的定义作为投入,因此适合获取以前没有的丰富新关系的知识。 与明确的KGs相比,黑箱LMs中的不精通知识往往难以获取或编辑,而且缺乏解释性。在这项工作中,我们的目标是从LMs中采集象征性的KG,这是一个通过神经LMS的灵活性和可缩放性增强的自动KG建设新框架。与以前的工作相比,我们的方法往往依赖大量的人类附加说明性数据或现有的大规模KGs,我们的方法只要求以最起码的定义作为投入,因此适合获取关于以前没有的丰富新关系的知识。这个方法自动生成或编辑,在给给特定LMMS公司内部进行高效的知识搜索,正如自动和人文评估所显示的那样,我们获取的知识比以前的方法更精确得多。结果,我们从不同的LMSMS的版本,也服务于BG家族, 包括了LG。