大型语言模型是否真能有效实现无训练冷启动推荐？ (Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?)

Recommender systems usually rely on large-scale interaction data to learn from users' past behaviors and make accurate predictions. However, real-world applications often face situations where no training data is available, such as when launching new services or handling entirely new users. In such cases, conventional approaches cannot be applied. This study focuses on training-free recommendation, where no task-specific training is performed, and particularly on \textit{training-free cold-start recommendation} (TFCSR), the more challenging case where the target user has no interactions. Large language models (LLMs) have recently been explored as a promising solution, and numerous studies have been proposed. As the ability of text embedding models (TEMs) increases, they are increasingly recognized as applicable to training-free recommendation, but no prior work has directly compared LLMs and TEMs under identical conditions. We present the first controlled experiments that systematically evaluate these two approaches in the same setting. The results show that TEMs outperform LLM rerankers, and this trend holds not only in cold-start settings but also in warm-start settings with rich interactions. These findings indicate that direct LLM ranking is not the only viable option, contrary to the commonly shared belief, and TEM-based approaches provide a stronger and more scalable basis for training-free recommendation.

翻译：推荐系统通常依赖大规模交互数据，从用户历史行为中学习并做出准确预测。然而，现实应用常面临缺乏训练数据的情境，例如启动新服务或处理全新用户时。在此类情况下，传统方法无法适用。本研究聚焦于无训练推荐——即不进行任务特定训练的场景，尤其关注更具挑战性的目标用户无交互记录的\textit{无训练冷启动推荐}。近期，大型语言模型被视为一种有前景的解决方案，相关研究层出不穷。随着文本嵌入模型能力的提升，其逐渐被认可适用于无训练推荐任务，但现有研究尚未在同等条件下直接比较LLM与TEM的性能。我们首次通过受控实验，在相同设置下系统评估这两种方法。结果表明，TEM在性能上优于LLM重排序器，且这一趋势不仅存在于冷启动场景，在具有丰富交互的热启动场景中同样成立。这些发现表明，直接使用LLM排序并非唯一可行方案，这与普遍认知相悖；基于TEM的方法为无训练推荐提供了更强大且可扩展的基础。