Neural dialogue response generation has gained much popularity in recent years. Maximum Likelihood Estimation (MLE) objective is widely adopted in existing dialogue model learning. However, models trained with MLE objective function are plagued by the low-diversity issue when it comes to the open-domain conversational setting. Inspired by the observation that humans not only learn from the positive signals but also benefit from correcting behaviors of undesirable actions, in this work, we introduce contrastive learning into dialogue generation, where the model explicitly perceives the difference between the well-chosen positive and negative utterances. Specifically, we employ a pretrained baseline model as a reference. During contrastive learning, the target dialogue model is trained to give higher conditional probabilities for the positive samples, and lower conditional probabilities for those negative samples, compared to the reference model. To manage the multi-mapping relations prevailed in human conversation, we augment contrastive dialogue learning with group-wise dual sampling. Extensive experimental results show that the proposed group-wise contrastive learning framework is suited for training a wide range of neural dialogue generation models with very favorable performance over the baseline training approaches.
翻译:近些年来,对神经对话的响应产生了非常受欢迎的生成。在现有的对话模式学习中,广泛采用了最大相似性估计(MLE)目标。然而,在开放域对话环境中,受过MLE目标功能培训的模型受到低多样性问题的困扰。受以下观察的启发,即人类不仅从正面信号中学习,而且从纠正不良行为行为的行为中受益。在这项工作中,我们把对比性学习引入了对话生成,模型明确认识到选好正面和负面言论之间的区别。具体地说,我们采用了预先培训的基准模型作为参考。在对比性学习期间,目标对话模型被培训为正样样本提供较高的有条件概率,而这些负面样本的有条件概率则低于参考模型。为了管理人类对话中普遍存在的多图关系,我们用群体角度的双重抽样学习了对比性对话。广泛的实验结果显示,拟议的群体观点对比性学习框架适合于培训广泛的神经对话生成模型,其效果优于基线培训方法。