符号知识蒸馏:从一般语言模式到常识模式 (Symbolic Knowledge Distillation: from General Language Models to Commonsense Models)

The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.

翻译：培训常识模型的常见做法已经从人类到肉体到机器:人类的作者的常识知识图,以培养常识模型。在这项工作中,我们调查了一种替代方法,从机器到身体到机器:一般语言模型,作者这些常识图,以培养常识模型。我们的研究引出了一个新的框架,即 " 符号知识蒸馏 " 。像以前的知识蒸馏艺术(Hinton等人,2015年)一样,我们的方法使用更大的模型来教授较小的模型。一个关键的区别是,我们把知识作为象征性的,作为文本添加到神经模型中。在这项工作中,我们只将普通语言模型教师的常识导出一个常识,让学生成为不同的类型,一种常识模型。我们综合地表明,迅速的工程和经过单独训练的批评模型让我们有选择性地从GPT-3中蒸馏高品质的因果共识共识(GPT-3,一个一般语言模型,2015年),我们的方法使用更大的模型来教授更小的模型。一个关键的区别是,我们第一次将人类的常识学常识读的常识学模型和智能知识图用于神经,尽管有3个质,但是我们的共同的常识化的常识数的常识数也都超过我们的常识。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日