Representation learning for text via pretraining a language model on a large corpus has become a standard starting point for building NLP systems. This approach stands in contrast to autoencoders, also trained on raw text, but with the objective of learning to encode each input as a vector that allows full reconstruction. Autoencoders are attractive because of their latent space structure and generative properties. We therefore explore the construction of a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer (an example of controlled generation), and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
翻译:通过对大体语言模型进行预培训,对文本进行代表性学习,已成为建设NLP系统的标准起点。这种方法与原文本培训的自动校正器形成对照,但目的是学习将每种输入编码成一个矢量,以便全面重建。自动校正器因其潜在的空间结构和基因特性而具有吸引力。因此,我们探索从预先训练的、冻结的变压器语言模型中构建一个句级自动校正器。我们把蒙面语言建模目标改成一种基因化的、拆译的,而只培训一个句号瓶颈和一个单层变压器解码器。我们证明,我们模型发现的句号比以往采用的方法质量更高,这些方法可以吸引预先训练过的变压器在文本相似性任务、样式转换(一个受控制的生成的范例)和GLUE基准中的单声调分类任务上的代表,同时使用比经过训练的大型模型较少的参数。