The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.
翻译:培训常识模型的常见做法已经从人类到肉体到机器:人类的作者的常识知识图,以培养常识模型。在这项工作中,我们调查了一种替代方法,从机器到身体到机器:一般语言模型,作者这些常识图,以培养常识模型。我们的研究引出了一个新的框架,即 " 符号知识蒸馏 " 。像以前的知识蒸馏艺术(Hinton等人,2015年)一样,我们的方法使用更大的模型来教授较小的模型。一个关键的区别是,我们把知识作为象征性的,作为文本添加到神经模型中。在这项工作中,我们只将普通语言模型教师的常识导出一个常识,让学生成为不同的类型,一种常识模型。我们综合地表明,迅速的工程和经过单独训练的批评模型让我们有选择性地从GPT-3中蒸馏高品质的因果共识共识(GPT-3,一个一般语言模型,2015年),我们的方法使用更大的模型来教授更小的模型。一个关键的区别是,我们第一次将人类的常识学常识读的常识学模型和智能知识图用于神经,尽管有3个质,但是我们的共同的常识化的常识数的常识数也都超过我们的常识。