培训前语言模式 (Self-conditioning pre-trained language models)

In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a surprisingly small amount of units is sufficient to steer text generation (as little as 3 units in a model with 345M parameters). While the objective of this work is to learn more about how TLMs work, we show that our method is effective for conditioning without fine-tuning or using extra parameters, even on fine-grained homograph concepts. Additionally, we show that our method can be used to correct gender bias present in the output of TLMs and achieves gender parity for all evaluated contexts. We compare our method with FUDGE and PPLM-BoW, and show that our approach is able to achieve gender parity at a lower perplexity. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs. The findings in this paper are a step forward in understanding the generative mechanisms of TLMs.

翻译：在本文件中,我们旨在调查以预先培训的基于变异器的语言模型(TLM)指导文本生成的机制。基于Hinton的专家产品,我们描述了一个利用TLM自然存在的专家单位的基因化机制(1999年),这些单位负责在投入和调整文本生成过程中发现概念,我们描述了如何确定专家单位以及如何在推断过程中激活这些单位,以便在生成的产出中产生任何理想的概念。我们发现,启动数量少得惊人的单位足以指导文本生成(在345M参数的模型中,只有3个单位)。虽然这项工作的目的是更多地了解TLMS如何工作,但我们表明,我们的方法在不作微调或使用额外参数的情况下,甚至在精细的同质模型概念上,都能够有效调节概念;我们指出,我们的方法可以用来纠正TLMM产出中存在的性别偏见,并实现所有被评估环境中的两性均等。我们将我们的方法与FUDGE和PLM-BoW的模型相比,我们的方法是足以指导文本生成的3个单元。我们的方法的目的是更多地了解TLMM方法能够达到最低的先理解。