This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.
翻译:本文展示了Z- Code++, 这是一种为抽象文本总和优化而优化的新的预培训语言模型。 该模型使用三种技术扩展了最先进的编码器解码器解码器模型。 首先, 我们使用两个阶段的预培训过程来改进模型在低资源总和任务方面的性能。 该模型首先使用文本组合进行预培训, 以便语言理解, 然后持续地为基础文本生成进行对称组合培训。 其次, 我们用分解的注意层来取代编码器中的自我注意层。 我们用两种矢量分别编码其内容和位置的状态来代表每个词。 第三, 我们使用聚变编码器, 一种简单而有效的编码长序列方法。 Z- Code+ 创建了13项文本总和任务中9项的新状态, 跨越5种语言。 我们的模型的参数效率高于 XSum 上的600x更大的 PaLM-540B, 以及精确调整的200x更大的GPT3- 175B模型, 在 SASM 上, 基本的GPT- broshopform- symod 。