Conversational recommender systems (CRS) that interact with users in natural language utilize recommendation dialogs collected with the help of paired humans, where one plays the role of a seeker and the other as a recommender. These recommendation dialogs include items and entities to disclose seekers' preferences in natural language. However, in order to precisely model the seekers' preferences and respond consistently, mainly CRS rely on explicitly annotated items and entities that appear in the dialog, and usually leverage the domain knowledge. In this work, we investigate INSPIRED, a dataset consisting of recommendation dialogs for the sociable conversational recommendation, where items and entities were explicitly annotated using automatic keyword or pattern matching techniques. To this end, we found a large number of cases where items and entities were either wrongly annotated or missing annotations at all. The question however remains to what extent automatic techniques for annotations are effective. Moreover, it is unclear what is the relative impact of poor and improved annotations on the overall effectiveness of a CRS in terms of the consistency and quality of responses. In this regard, first, we manually fixed the annotations and removed the noise in the INSPIRED dataset. Second, we evaluate the performance of several benchmark CRS using both versions of the dataset. Our analyses suggest that with the improved version of the dataset, i.e., INSPIRED2, various benchmark CRS outperformed and that dialogs are rich in knowledge concepts compared to when the original version is used. We release our improved dataset (INSPIRED2) publicly at https://github.com/ahtsham58/INSPIRED2.
翻译:与自然语言用户互动的沟通建议系统(CRS)使用由配对人帮助收集的建议对话,在配对人的帮助下收集的建议对话,其中一个人扮演着寻求者的角色,而另一个人则担任建议者的角色。这些建议对话包括以自然语言披露寻求者的偏好的项目和实体。然而,为了精确地建模寻求者的偏好,并连贯一致地回应,主要是CRS依赖于在对话框中出现的有明确注解的项目和实体,并通常利用域域知识。在这项工作中,我们调查了ISPIRED的一套数据集,该数据集包括由建议对话建议对话中收集的丰富对话对话,其中用自动关键词或模式匹配技术对项目和实体作了明确的附加说明。为此,我们发现了大量项目和实体不是错误地以自然语言披露寻求者的偏好,就是完全缺少说明。然而,问题在于,在对话中,在我们用IMISRIS/REP数据版本来评估CRIS的注释和噪音,第二个我们用IMIS数据基准版本来评估。