Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called MacBERT, which improves upon RoBERTa in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, RoBERTa, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed MacBERT. Experimental results show that MacBERT could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm
翻译:变换者(变换者)的双向编码器代表处(BERT)在各种NLP任务中表现出了惊人的改进,并提出了一系列连续的变式,以进一步提高经过培训的语言模式的绩效。在本文中,我们的目标是首先为中国BERT引入全字遮掩(wwm)战略,以及一系列经过培训的中国语言模式。然后我们又提出了一个简单而有效的模式,称为MacBERTER,它以多种方式改进了RoBERTA。特别是,我们提出了一个名为MLM(MAc)的新遮罩战略。为了展示这些模式的有效性,我们创建了一系列经过培训的中国语言模型,作为我们的基线,包括BERT、RoBERTA、ELECTRA、RBT等。我们对十项中国NLP任务进行了广泛的实验,以评价创建的经过培训的中国语言模式以及拟议的MacBERT。实验结果显示,MacBERT能够在许多NLP任务上实现最新的艺术表现,我们还进一步详细介绍了一些结论,这些结论可能有助于未来的研究。 我们的开放资源。