The use of natural language (NL) user profiles in recommender systems offers greater transparency and user control compared to traditional representations. However, there is scarcity of large-scale, publicly available test collections for evaluating NL profile-based recommendation. To address this gap, we introduce SciNUP, a novel synthetic dataset for scholarly recommendation that leverages authors' publication histories to generate NL profiles and corresponding ground truth items. We use this dataset to conduct a comparison of baseline methods, ranging from sparse and dense retrieval approaches to state-of-the-art LLM-based rerankers. Our results show that while baseline methods achieve comparable performance, they often retrieve different items, indicating complementary behaviors. At the same time, considerable headroom for improvement remains, highlighting the need for effective NL-based recommendation approaches. The SciNUP dataset thus serves as a valuable resource for fostering future research and development in this area.
翻译:与传统表示方法相比,在推荐系统中使用自然语言(NL)用户画像能提供更高的透明度与用户可控性。然而,目前缺乏用于评估基于自然语言画像推荐的大规模公开测试集。为填补这一空白,我们提出了SciNUP——一个新颖的学术推荐合成数据集,其利用作者的发表历史生成自然语言画像及对应的真实相关文献。我们使用该数据集对多种基线方法进行了比较,涵盖从稀疏与稠密检索方法到基于前沿大语言模型的重排序器。结果表明,尽管基线方法取得了相近的性能表现,但它们检索出的文献往往存在差异,这表明了不同方法具有互补性。同时,现有方法仍有显著的提升空间,凸显了开发有效的基于自然语言的推荐方法的必要性。因此,SciNUP数据集将成为推动该领域未来研究与发展的宝贵资源。