Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the specific task's loss in pre-training step. In industry NLP applications, we have large amount of data produced by users. We use the fine-tuned model to give the user-generated unlabeled data a pseudo-label. Then we use the pseudo-label for the task-specific loss and masked language model loss to pre-train. The experiment shows that using the fine-tuned model's predictions for pseudo-labeled pre-training offers further gains in the downstream task. The improvement of our method is stable and remarkable.
翻译:语言模式培训前培训已经证明在许多语言理解任务中非常有用。 在本文中,我们调查在培训前步骤中添加具体任务的损失是否仍然有用。 在行业NLP应用程序中,我们拥有大量用户生成的数据。我们使用微调模型给用户生成的未贴标签数据贴上假标签。然后我们使用伪标签作为任务特定损失的标签,将隐蔽语言模式损失作为培训前的标签。实验显示,使用微调模型对假标签培训前阶段的预测在下游任务中取得了进一步收益。我们的方法改进是稳定和显著的。