$ $Dialogue systems are evaluated depending on their type and purpose. Two categories are often distinguished: (1) task-oriented dialogue systems (TDS), which are typically evaluated on utility, i.e., their ability to complete a specified task, and (2) open domain chatbots, which are evaluated on the user experience, i.e., based on their ability to engage a person. What is the influence of user experience on the user satisfaction rating of TDS as opposed to, or in addition to, utility? We collect data by providing an additional annotation layer for dialogues sampled from the ReDial dataset, a widely used conversational recommendation dataset. Unlike prior work, we annotate the sampled dialogues at both the turn and dialogue level on six dialogue aspects: relevance, interestingness, understanding, task completion, efficiency, and interest arousal. The annotations allow us to study how different dialogue aspects influence user satisfaction. We introduce a comprehensive set of user experience aspects derived from the annotators' open comments that can influence users' overall impression. We find that the concept of satisfaction varies across annotators and dialogues, and show that a relevant turn is significant for some annotators, while for others, an interesting turn is all they need. Our analysis indicates that the proposed user experience aspects provide a fine-grained analysis of user satisfaction that is not captured by a monolithic overall human rating.
翻译:通常分为两类:(1) 以任务为导向的对话系统(TDS),通常根据效用来评估,即它们完成特定任务的能力;(2) 开放域聊天机,根据用户经验来评估,即根据他们接触一个人的能力来评估。用户经验对TDS用户满意度评级的影响如何?我们收集数据的方式是,为从ReDial数据集抽取的对话提供额外的说明层,这是一个广泛使用的谈话性建议数据集。与以前的工作不同,我们注意到在转弯和对话一级就六个对话方面的抽样对话:相关性、有趣性、理解性、任务完成、效率和令人感兴趣的问题。说明使我们能够研究不同对话方面如何影响用户满意度。我们介绍一套全面的用户经验方面,从说明者公开评论中得出,可以影响用户的总体印象。我们发现,满意度的概念在细微的和对话中各不相同,与以前的工作不同,我们注意到,在转弯曲和对话一级,我们在六个对话中都注意到,对于用户来说,一个有意义的分析是有意义的。