Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.
翻译:在经过社交媒体评论培训之前,许多开放式对话模式在与实际用户互动时可以产生一致的答复,但很难产生有吸引力的反应。这种现象可能主要是由于缺乏附加说明的人类对话,与人类偏好不吻合。在本文中,我们提出一种新颖而有效的Diamante方法,以提升开放的Diamante聊天室,其中收集和利用两种类型的人类反馈(包括明确的演示和隐含的偏好)。Diamante通过要求通知者选择或修改模型产生的候选回应,有效地收集了人类展示的反应,并构建了中国的chit聊天数据集。为了更好地与人类偏好保持一致, Diamante利用数据收集过程中的隐含偏好,并引入了一代评价联合培训。全面实验表明, Diamante数据集和联合培训模式可以极大地提升中国预先培训的对话模式的性能。