Although exposure bias has been widely studied in some NLP tasks, it faces its unique challenges in dialogue response generation, the representative one-to-various generation scenario. In real human dialogue, there are many appropriate responses for the same context, not only with different expressions, but also with different topics. Therefore, due to the much bigger gap between various ground-truth responses and the generated synthetic response, exposure bias is more challenging in dialogue generation task. What's more, as MLE encourages the model to only learn the common words among different ground-truth responses, but ignores the interesting and specific parts, exposure bias may further lead to the common response generation problem, such as "I don't know" and "HaHa?" In this paper, we propose a novel adaptive switching mechanism, which learns to automatically transit between ground-truth learning and generated learning regarding the word-level matching score, such as the cosine similarity. Experimental results on both Chinese STC dataset and English Reddit dataset, show that our adaptive method achieves a significant improvement in terms of metric-based evaluation and human evaluation, as compared with the state-of-the-art exposure bias approaches. Further analysis on NMT task also shows that our model can achieve a significant improvement.
翻译:尽管在一些国家劳工政策办公室的任务中广泛研究了接触偏差问题,但它在对话响应生成方面面临着独特的挑战,即具有代表性的一到一代人的设想。在真正的人类对话中,对同一背景有许多适当的反应,不仅有不同的表达方式,而且有不同的专题。因此,由于各种地面真相反应与生成的合成反应之间的差距大得多,因此在对话生成任务中,暴露偏差更具挑战性。此外,由于国家劳工政策部鼓励模型只学习不同地面真相反应之间的共同词,但忽视了有趣的具体部分,暴露偏差可能会进一步导致共同反应生成的问题,如“我不知道”和“HaHaHa?”等。在本文中,我们提出了一个创新的适应性转换机制,在地面真相学习和生成有关字级匹配分数的学习之间,例如相似性。在中国STC数据集和英国红度数据集上的实验结果显示,我们的适应性方法在基于指标的评价和人类评价方面取得了显著的改进,而与我们的任务模型分析显示的显著的偏差性。