We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM). Given an input text with masked tokens, we rely on conventional masks to learn inter-relations between corrupted tokens and context via autoencoding, and pseudo masks to learn intra-relations between masked spans via partially autoregressive modeling. With well-designed position embeddings and self-attention masks, the context encodings are reused to avoid redundant computation. Moreover, conventional masks used for autoencoding provide global masking information, so that all the position embeddings are accessible in partially autoregressive language modeling. In addition, the two tasks pre-train a unified language model as a bidirectional encoder and a sequence-to-sequence decoder, respectively. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks across several widely used benchmarks.
翻译:我们提议采用新培训程序,将自动编码和部分自动递增语言模型任务的统一语言模型(假冒假冒假冒语言模型)进行预演。根据带有蒙面符号的输入文本,我们依靠常规遮罩学习通过自动编码腐蚀的象征物和上下文之间的关系,以及假遮罩通过部分自动递减模型学习蒙面区域之间的内部关系。有了精心设计的位置嵌入和自我注意面罩,上下文编码被再利用以避免重复计算。此外,用于自动编码的常规遮罩提供了全球遮罩信息,因此所有嵌入的位置都可以在部分自动递增语言模型中查阅。此外,两项任务前将统一语言模型作为双向编码器和顺序至后序解码器,分别作为双向编码器和顺序解码器。我们的实验显示,使用PMLMM预先培训的统一语言模型在广泛使用的多种自然语言理解和生成任务中取得了新的状态结果。