In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.
翻译:在自然语言处理领域,预训练语言模型已经成为不可或缺的基础设施。然而,这些模型通常存在诸如体积大、推理时间长、部署困难等问题。此外,大多数主流的预训练模型都关注英文,而对小型中文预训练模型的研究不足。本文介绍了MiniRBT,一个小型中文预训练模型,旨在推动中文自然语言处理研究。MiniRBT采用狭长深度的学生模型,并在预训练期间采用整词Masking和两阶段精简,使其非常适合大多数下游任务。我们在机器阅读理解和文本分类任务上的实验表明,MiniRBT相对于RoBERTa的性能达到了94%,同时提供了6.8倍的加速,说明其有效性和高效性。