Recommendation systems have witnessed significant advancements and have been widely used over the past decades. However, most traditional recommendation methods are task-specific and therefore lack efficient generalization ability. Recently, the emergence of ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. Nonetheless, the application of ChatGPT in the recommendation domain has not been thoroughly investigated. In this paper, we employ ChatGPT as a general-purpose recommendation model to explore its potential for transferring extensive linguistic and world knowledge acquired from large-scale corpora to recommendation scenarios. Specifically, we design a set of prompts and evaluate ChatGPT's performance on five recommendation scenarios. Unlike traditional recommendation methods, we do not fine-tune ChatGPT during the entire evaluation process, relying only on the prompts themselves to convert recommendation tasks into natural language tasks. Further, we explore the use of few-shot prompting to inject interaction information that contains user potential interest to help ChatGPT better understand user needs and interests. Comprehensive experimental results on Amazon Beauty dataset show that ChatGPT has achieved promising results in certain tasks and is capable of reaching the baseline level in others. We conduct human evaluations on two explainability-oriented tasks to more accurately evaluate the quality of contents generated by different models. And the human evaluations show ChatGPT can truly understand the provided information and generate clearer and more reasonable results. We hope that our study can inspire researchers to further explore the potential of language models like ChatGPT to improve recommendation performance and contribute to the advancement of the recommendation systems field.
翻译:----
推荐系统在过去几十年里取得了显著的进展,并得到了广泛的应用。然而,大多数传统的推荐方法都是针对特定任务的,因此缺乏有效的泛化能力。近年来,ChatGPT的出现通过提升对话模型的能力,极大地推进了NLP任务的发展。然而,ChatGPT在推荐领域的应用尚未得到彻底的研究。在本文中,我们使用ChatGPT作为通用推荐模型,探索其将从大规模语料库中获取的广泛语言和世界知识转移至推荐场景的潜力。具体而言,我们设计了一组提示,并评估了ChatGPT在五个推荐场景的性能。与传统的推荐方法不同,我们在整个评估过程中没有对ChatGPT进行微调,仅依靠提示本身将推荐任务转化为自然语言任务。此外,我们还探索了使用少量交互信息(包含用户潜在的兴趣)进行提示,以帮助ChatGPT更好地理解用户的需求和兴趣。基于Amazon Beauty数据集的全面实验结果表明,ChatGPT在某些任务中取得了有希望的结果,并且在其他任务中能够达到基线水平。我们对两个以解释为导向的任务进行了人类评估,以更准确地评估不同模型生成的内容质量。人类评估表明,ChatGPT能够真正理解所提供的信息,并生成更清晰、更合理的结果。我们希望我们的研究能够激发研究人员进一步探索像ChatGPT这样的语言模型,以改进推荐性能,并为推荐系统领域的发展做出贡献。