Rec-R1：通过强化学习桥接生成式大语言模型与以用户为中心的推荐系统 (Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning)

We propose Rec-R1, a general reinforcement learning framework that bridges large language models (LLMs) with recommendation systems through closed-loop optimization. Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimizes LLM generation using feedback from a fixed black-box recommendation model, without relying on synthetic SFT data from proprietary models such as GPT-4o. This avoids the substantial cost and effort required for data distillation. To verify the effectiveness of Rec-R1, we evaluate it on two representative tasks: product search and sequential recommendation. Experimental results demonstrate that Rec-R1 not only consistently outperforms prompting- and SFT-based methods, but also achieves significant gains over strong discriminative baselines, even when used with simple retrievers such as BM25. Moreover, Rec-R1 preserves the general-purpose capabilities of the LLM, unlike SFT, which often impairs instruction-following and reasoning. These findings suggest Rec-R1 as a promising foundation for continual task-specific adaptation without catastrophic forgetting.

翻译：我们提出了Rec-R1，这是一个通用的强化学习框架，通过闭环优化将大语言模型与推荐系统相连接。与提示工程和监督微调不同，Rec-R1直接利用固定黑盒推荐模型的反馈来优化LLM的生成过程，而无需依赖诸如GPT-4o等专有模型生成的合成SFT数据。这避免了数据蒸馏所需的大量成本与精力。为验证Rec-R1的有效性，我们在两个代表性任务上进行了评估：产品搜索和序列推荐。实验结果表明，Rec-R1不仅持续优于基于提示工程和SFT的方法，即使在使用BM25等简单检索器时，也显著超越了强大的判别式基线模型。此外，与常会损害指令遵循和推理能力的SFT不同，Rec-R1保留了LLM的通用能力。这些发现表明，Rec-R1为实现持续的任务特定适应而不发生灾难性遗忘提供了一个有前景的基础。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日