Symbolic knowledge graphs (KGs) have been constructed either by expensive human crowdsourcing or with complex text mining pipelines. The emerging large pretrained language models (LMs), such as Bert, have shown to implicitly encode massive knowledge which can be queried with properly designed prompts. However, compared to the explicit KGs, the implict knowledge in the black-box LMs is often difficult to access or edit and lacks explainability. In this work, we aim at harvesting symbolic KGs from the LMs, and propose a new framework for automatic KG construction empowered by the neural LMs' flexibility and scalability. Compared to prior works that often rely on large human annotated data or existing massive KGs, our approach requires only the minimal definition of relations as inputs, and hence is suitable for extracting knowledge of rich new relations that are instantly assigned and not available before. The framework automatically generates diverse prompts, and performs efficient knowledge search within a given LM for consistent outputs. The knowledge harvested with our approach shows competitive quality, diversity, and novelty. As a result, we derive from diverse LMs a family of new KGs (e.g., BertNet and RoBERTaNet) that contain a richer set of relations, including some complex ones (e.g., "A is capable of but not good at B") that cannot be extracted with previous methods. Besides, the resulting KGs also serve as a vehicle to interpret the respective source LMs, leading to new insights into the varying knowledge capability of different LMs.
翻译:在这项工作中,我们的目标是从LMS中提取象征性的KGs, 并提议一个新的KG建设框架,通过神经力LM的灵活性和可缩放性增强能力。与以前常常依赖大量人类附加说明数据或现有大规模KGs的工程相比,我们的方法只要求以最起码的定义作为投入,因此,与明确的KGs相比,黑箱LMs的不灵通知识往往难以获取或编辑,而且缺乏解释性。在这项工作中,我们的目标是从LMS中提取象征性的KGs,并提出一个新的框架,用于自动KG的建筑。与以前常常依赖大量人类附加说明的数据或现有大规模KGGs的工程相比,我们的方法只需要对作为投入的关系作出最起码的定义,因此,才适宜于获取关于立即指定和没有可用的丰富新关系的知识。框架自动产生多种提示,并在给定的LMM内进行高效率的知识搜索,以取得一致的产出。通过我们的方法获取的知识显示了竞争性的源、多样性和新颖性。结果,我们从不同的LMSMs获得的版本,我们从一个具有较多样化的LMs,但是我们从一个能够理解的RG-G-Bxxxxxx