Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But there are no remarkable improvement for short text understanding for similar BERT structures.
翻译:最近,培训前语言模式的发展将自然语言处理(NLP)的任务带到了新的最新工艺水平上。在本文中,我们探讨了各种培训前语言模式的效率。我们准备了一张基于变压器的模型清单,其文本数量相同,培训步骤也相同。实验结果显示,起源国BERT的最大改进是增加了RNN级,以获取更多背景信息,便于简短理解文本。但是类似的BERT结构的短文本理解并没有显著的改进。