Conversational recommender systems (CRS) are interactive agents that support their users in recommendation-related goals through multi-turn conversations. Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly rely on offline(computational) measures to assess the performance of their algorithms in comparison to different baselines. However, offline measures can have limitations, for example, when the metrics for comparing a newly generated response with a ground truth do not correlate with human perceptions, because various alternative generated responses might be suitable too in a given dialog situation. Current research on machine learning-based CRS models therefore acknowledges the importance of humans in the evaluation process, knowing that pure offline measures may not be sufficient in evaluating a highly interactive system like a CRS.
翻译:对话推荐系统(CRS)是互动的代理机构,通过多方向对话支持用户实现与建议有关的目标。一般而言,CRS可以在许多方面进行评估。今天的CRS主要依靠离线(计算)措施来评估其算法相对于不同基线的性能。然而,离线措施可能会有局限性,例如,当将新产生的反应与地面真理进行比较的衡量标准与人类感知不相干时,因为生成的各种替代反应在特定对话情况下可能也适宜。因此,目前对机器学习为基础的CRS模型的研究承认人类在评估过程中的重要性,知道纯粹的离线措施可能不足以评估像CRS这样的高度互动的系统。