Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect of encoder-side supervision, which we argue may lead to sub-optimal performance. To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation. Therefore, we propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2, which improves the seq2seq models via integrating more efficient self-supervised information into the encoders. Specifically, E2S2 adopts two self-supervised objectives on the encoder side from two aspects: 1) locally denoising the corrupted sentence (denoising objective); and 2) globally learning better sentence representations (contrastive objective). With the help of both objectives, the encoder can effectively distinguish the noise tokens and capture high-level (i.e. syntactic and semantic) knowledge, thus strengthening the ability of seq2seq model to accurately achieve the conditional generation. On a large diversity of downstream natural language understanding and generation tasks, E2S2 dominantly improves the performance of its powerful backbone models, e.g. BART and T5. For example, upon BART backbone, we achieve +1.1% averaged gain on the general language understanding evaluation (GLUE) benchmark and +1.75% F_0.5 score improvement on CoNLL2014 dataset. We also provide in-depth analyses to show the improvement stems from better linguistic representation. We hope that our work will foster future self-supervision research on seq2seq language model pretraining.
翻译:序列到序列( seq2seq) 学习是大规模语言预培训模式的流行模式。 但是, 之前的后继2seq 预培训模式通常侧重于解码器侧面的重建目标, 忽视了编码器侧端监督的影响, 我们认为这可能会导致低于最佳的性能。 为了验证我们的假设, 我们首先从经验上研究后继2seq预培训语言模型中的编码器和解码器功能, 发现编码器在下游表现和神经激活方面比解码器要重要但开发不足的作用。 因此, 我们提出一个编码增强后继2seqeqregreeg 预培训战略, 即 E2S2S2, 通过将更高效的自我监督信息融入到编译器中来改善后继模式。 具体地, E2S2S2在编码模型前两个方面采用了自我控制的目标: 1) 本地对腐败的句子进行分解( 表示我们分解目标); 2) 全球学习更好的句表达( contravidudeal realalal ladeal le) ladeal ladeal deal deal ladeal) lade lade ladeal ladeal ladeal ladeal) aus.