Conversational recommendation systems (CRSs) enable users to use natural language feedback to control their recommendations, overcoming many of the challenges of traditional recommendation systems. However, the practical adoption of CRSs remains limited due to a lack of rich and diverse conversational training data that pairs user utterances with recommendations. To address this problem, we introduce a new method to generate synthetic training data by transforming curated item collections, such as playlists or movie watch lists, into item-seeking conversations. First, we use a biased random walk to generate a sequence of slates, or sets of item recommendations; then, we use a language model to generate corresponding user utterances. We demonstrate our approach by generating a conversational music recommendation dataset with over one million conversations, which were found to be consistent with relevant recommendations by a crowdsourced evaluation. Using the synthetic data to train a CRS, we significantly outperform standard retrieval baselines in offline and online evaluations.
翻译:为了解决这一问题,我们引入了一种新的方法来生成合成培训数据,将游戏列表或电影观察清单等集集转换成搜索项目的对话。首先,我们使用有偏向的随机行走来生成一系列板块或成套项目建议;然后,我们使用一种语言模型来生成相应的用户语句。我们通过生成一个包含100多万次对话的谈话音乐建议数据来展示我们的方法,这些数据被认为与众包评估的相关建议相一致。我们利用合成数据来培训CRS,我们大大超过了离线和在线评估的标准检索基线。