In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed for both natural language understanding (NLU) and natural language generation (NLG) tasks. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.
翻译:在本文中,我们利用以前经过培训的模型(PTMs)的优势,并提出了中国经过培训的新型超均变异器(CPT),不同于中国经过培训的超均变异器(CPT),欧洲防止酷刑委员会的设计既针对自然语言理解(NLU),又针对自然语言生成(NLG)的任务,由三部分组成:共同编码器、理解解码器和一代解码器;两个配有共同编码器的具体解码器,分别针对隐蔽语言建模(MLMM)和解密自动编码(DAE)的任务进行了预先培训。在部分共享的架构和多任务前培训中,欧洲防止酷刑委员会可以(1)用两个解码器学习关于NLU或NLG任务的具体知识,(2)灵活调整,充分利用模型的潜力。此外,不平衡的变异器节省了计算和存储成本,使CPT具有竞争力,大大加快了文本生成的推论。