Pre-trained language models (PrLM) has been shown powerful in enhancing a broad range of downstream tasks including various dialogue related ones. However, PrLMs are usually trained on general plain text with common language model (LM) training objectives, which cannot sufficiently capture dialogue exclusive features due to the limitation of such training setting, so that there is an immediate need to fill the gap between a specific dialogue task and the LM task. As it is unlikely to collect huge dialogue data for dialogue-oriented pre-training, in this paper, we propose three strategies to simulate the conversation features on general plain text. Our proposed method differs from existing post-training methods that it may yield a general-purpose PrLM and does not individualize to any detailed task while keeping the capability of learning dialogue related features including speaker awareness, continuity and consistency. The resulted Dialog-PrLM is fine-tuned on three public multi-turn dialogue datasets and helps achieve significant and consistent improvement over the plain PrLMs.
翻译:预先培训的语言模式(PrLM)在加强广泛的下游任务,包括各种与对话有关的任务方面表现得非常有力,然而,PrLMS通常在一般普通文本和通用语言模式培训目标方面接受培训,由于这种培训环境的限制,无法充分捕捉对话独有的特点,因此立即需要填补具体对话任务与LM任务之间的差距。由于不可能为以对话为导向的培训前任务收集巨大的对话数据,我们在本文件中提出了三个战略,以模拟普通文本上的谈话特征。我们提出的方法不同于现有的培训后方法,即可能产生一般用途的PRLM,在保持与对话有关的特点,包括演讲者意识、连续性和一致性方面的能力的同时,不将任何详细的任务单独化。结果的Dialog-PrLM在三个公共多方向对话数据集上进行了微调,有助于在普通 PrLMs上实现重大和一致的改进。