Pre-trained language models (PrLMs) have achieved great success on a wide range of natural language processing tasks by virtue of the universal language representation ability obtained by self-supervised learning on a large corpus. These models are pre-trained on standard plain texts with general language model (LM) training objectives, which would be insufficient to model dialogue-exclusive attributes like specificity and informativeness reflected in these tasks that are not explicitly captured by the pre-trained universal language representations. In this work, we propose dialogue-adaptive pre-training objectives (DAPO) derived from quality estimation to simulate dialogue-specific features, namely coherence, specificity, and informativeness. As the foundation for model pre-training, we synthesize a new dialogue corpus and build our training set with two unsupervised methods: 1) coherence-oriented context corruption, including utterance ordering, insertion, and replacement, to help the model capture the coherence inside the dialogue contexts; and 2) specificity-oriented automatic rescoring, which encourages the model to measure the quality of the synthesized data for dialogue-adaptive pre-training by considering specificity and informativeness. Experimental results on widely used open-domain response selection and quality estimation benchmarks show that DAPO significantly improves the baseline models and achieves state-of-the-art performance on the MuTual leaderboard, verifying the effectiveness of estimating quality evaluation factors into pre-training.
翻译:预先培训的语言模式(PrLMS)由于通过在大量材料上进行自我监督的学习获得普遍语言代表能力,在广泛的自然语言处理任务方面取得了巨大成功,这些模式在标准简单文本和通用语言模式培训目标方面进行了预先培训,这些培训目标将不足以模拟在这些任务中体现的排他性特征,例如具体性和信息性,这些特点和信息性没有被事先培训的普遍语言代表明确体现。在这项工作中,我们提出从质量估算到模拟具体对话特征,即一致性、具体性和信息性,的对话适应性培训前目标(DAPO)。作为培训前示范的基础,我们综合了一个新的对话材料,并以两种不受监督的方法构建了我们的培训套套件:(1) 面向一致性的背景腐败,包括命令、插入和替换,以帮助模型在对话背景下体现一致性;(2) 以具体性为导向的自动校正,通过考虑具体性和信息性,鼓励衡量对话适应性培训前综合数据的质量。