Language model pre-training based on large corpora has achieved tremendous success in terms of constructing enriched contextual representations and has led to significant performance gains on a diverse range of Natural Language Understanding (NLU) tasks. Despite the success, most current pre-trained language models, such as BERT, are trained based on single-grained tokenization, usually with fine-grained characters or sub-words, making it hard for them to learn the precise meaning of coarse-grained words and phrases. In this paper, we propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text. Our method can be applied to various pre-trained language models and improve their representation capability. Extensive experiments conducted on CLUE and SuperGLUE demonstrate that our method achieves comprehensive improvements on a wide variety of NLU tasks in both Chinese and English with little extra inference cost incurred, and that our best ensemble model achieves the state-of-the-art performance on CLUE benchmark competition.
翻译:以大型公司为基础的语言模式预培训在构建丰富背景表述方面取得了巨大成功,并导致在一系列不同的自然语言理解任务方面取得了显著的绩效成果。尽管取得了成功,但目前大多数预先培训的语言模式,如BERT, 都以单粒标记方式培训,通常使用细微的字符或小字,使他们难以了解粗粗的词句和短语的确切含义。在本文件中,我们提议了一个简单而有效的培训前方法,名为LICEE, 以有效地纳入多粒子输入文本的信息。我们的方法可以应用于各种预先培训的语言模式,并提高其代表能力。在CLUE和SuperGLUE上进行的广泛实验表明,我们的方法在中英两种语言的多种NLU任务上都取得了全面改进,而很少产生额外的推论成本。我们的最佳组合模型在CLUE基准竞争上取得了最先进的业绩。