In recent years, pretrained models have been widely used in various fields, including natural language understanding, computer vision, and natural language generation. However, the performance of these language generation models is highly dependent on the model size and the dataset size. While larger models excel in some aspects, they cannot learn up-to-date knowledge and are relatively difficult to relearn. In this paper, we introduce EvoText, a novel training method that enhances the performance of any natural language generation model without requiring additional datasets during the entire training process (although a prior dataset is necessary for pretraining). EvoText employs two models: $G$, a text generation model, and $D$, a model that can determine whether the data generated by $G$ is legitimate. Initially, the fine-tuned $D$ model serves as the knowledge base. The text generated by $G$ is then input to $D$ to determine whether it is legitimate. Finally, $G$ is fine-tuned based on $D$'s output. EvoText enables the model to learn up-to-date knowledge through a self-escalation process that builds on a priori knowledge. When EvoText needs to learn something new, it simply fine-tunes the $D$ model. Our approach applies to autoregressive language modeling for all Transformer classes. With EvoText, eight models achieved stable improvements in seven natural language processing tasks without any changes to the model structure.
翻译:翻译后的摘要:
近年来,预训练模型已广泛应用于各个领域,包括自然语言理解、计算机视觉和自然语言生成。但是,这些语言生成模型的性能高度依赖于模型大小和数据集大小。虽然较大的模型在某些方面表现出色,但它们无法学习最新知识,并且相对难以重新学习。在本文中,我们介绍了 EvoText,这是一种新的训练方法,可以提高任何自然语言生成模型的性能,而不需要在整个训练过程中使用额外的数据集(虽然预训练需要先前的数据集)。EvoText采用两个模型:$G$,文本生成模型,和 $D$,可以确定 $G$ 生成的数据是否合法的模型。最初,微调的 $D$ 模型作为知识库。然后,将 $G$ 生成的文本输入到 $D$ 中以确定其是否合法。最后,基于 $D$ 的输出对 $G$ 进行微调。EvoText 通过自我升级过程来构建先验知识,从而使模型学习最新知识。当 EvoText 需要学习新东西时,它只需微调 $D$ 模型即可。我们的方法适用于所有 Transformer 类的自回归语言建模。通过 EvoText,8 个模型在7个自然语言处理任务中实现了稳定的改进,而无需更改模型结构。