In recent years, dialogue systems have attracted significant interests in both academia and industry. Especially the discipline of open-domain dialogue systems, aka chatbots, has gained great momentum. Yet, a long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics, which results in significant impediment in the current research. Common practice in assessing the performance of open-domain dialogue models involves extensive human evaluation on the final deployed models, which is both time- and cost- intensive. Moreover, a recent trend in building open-domain chatbots involve pre-training dialogue models with a large amount of social media conversation data. However, the information contained in the social media conversations may be offensive and inappropriate. Indiscriminate usage of such data can result in insensitive and toxic generative models. This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10).
翻译:近年来,对话系统吸引了学术界和产业界的极大兴趣,特别是开放式对话系统(aka chatbots)的纪律已获得巨大势头,然而,令研究人员困扰的长期挑战在于缺乏有效的自动评价指标,这严重妨碍了目前的研究;评估开放式对话模式的通用做法是对最后部署模式进行广泛的人力评价,既需要时间,也需要费用;此外,最近建立开放式聊天平台的趋势涉及培训前对话模式,并有大量社交媒体对话数据;然而,社交媒体对话中的信息可能具有攻击性和不适当性;滥用这些数据可能导致产生敏感和有毒的基因化模式;本文介绍了对话系统技术挑战10(DST10)第5轨的数据、基线和结果。