Conversational recommender systems aim to interactively support online users in their information search and decision-making processes in an intuitive way. With the latest advances in voice-controlled devices, natural language processing, and AI in general, such systems received increased attention in recent years. Technically, conversational recommenders are usually complex multi-component applications and often consist of multiple machine learning models and a natural language user interface. Evaluating such a complex system in a holistic way can therefore be challenging, as it requires (i) the assessment of the quality of the different learning components, and (ii) the quality perception of the system as a whole by users. Thus, a mixed methods approach is often required, which may combine objective (computational) and subjective (perception-oriented) evaluation techniques. In this paper, we review common evaluation approaches for conversational recommender systems, identify possible limitations, and outline future directions towards more holistic evaluation practices.
翻译:对话建议系统旨在以直觉的方式互动支持在线用户的信息搜索和决策进程,近年来,随着语音控制装置、自然语言处理和大赦国际的最新进展,这类系统普遍受到越来越多的关注,从技术上讲,对话建议通常是复杂的多组成部分应用程序,通常由多种机器学习模式和自然语言用户界面组成。因此,以整体方式评价这样一个复杂的系统可能具有挑战性,因为它要求:(一) 评估不同学习组成部分的质量,(二) 用户对整个系统的质量看法。因此,往往需要采用混合方法,将目标(截图)和主观(面向感知)评价技术结合起来。在本文件中,我们审查对对话建议系统的共同评价方法,找出可能的局限性,并概述今后走向更全面评价做法的方向。