Recommender Systems (RS) are fundamental to modern online services. While most existing approaches optimize for short-term engagement, recent work has begun to explore reinforcement learning (RL) to model long-term user value. However, these efforts face significant challenges due to the vast, dynamic action spaces inherent in recommendation, which hinder stable policy learning. To resolve this bottleneck, we introduce Hierarchical Semantic RL (HSRL), which reframes RL-based recommendation over a fixed Semantic Action Space (SAS). HSRL encodes items as Semantic IDs (SIDs) for policy learning, and maps SIDs back to their original items via a fixed, invertible lookup during execution. To align decision-making with SID generation, the Hierarchical Policy Network (HPN) operates in a coarse-to-fine manner, employing hierarchical residual state modeling to refine each level's context from the previous level's residual, thereby stabilizing training and reducing representation-decision mismatch. In parallel, a Multi-level Critic (MLC) provides token-level value estimates, enabling fine-grained credit assignment. Across public benchmarks and a large-scale production dataset from a leading Chinese short-video advertising platform, HSRL consistently surpasses state-of-the-art baselines. In online deployment over a seven-day A/B testing, it delivers an 18.421% CVR lift with only a 1.251% increase in cost, supporting HSRL as a scalable paradigm for RL-based recommendation. Our code is released at https://github.com/MinmaoWang/HSRL.
翻译:推荐系统是现代在线服务的核心。尽管现有方法大多针对短期参与度进行优化,但近期研究开始探索利用强化学习建模长期用户价值。然而,由于推荐场景中固有的海量动态动作空间阻碍了稳定的策略学习,这些尝试面临重大挑战。为突破此瓶颈,我们提出分层语义强化学习,该方法在固定的语义动作空间上重构基于强化学习的推荐框架。HSRL将物品编码为语义ID用于策略学习,并在执行阶段通过固定的可逆查找表将语义ID映射回原始物品。为使决策过程与语义ID生成对齐,分层策略网络采用由粗到细的工作方式,通过分层残差状态建模从前一层残差中提炼当前层级的上下文信息,从而稳定训练并减少表征与决策的失配。同时,多层级评论家网络提供令牌级价值估计,实现细粒度信用分配。在公开基准数据集和来自中国头部短视频广告平台的大规模生产数据集上的实验表明,HSRL持续超越最先进的基线模型。在为期七天的在线A/B测试部署中,该方法在成本仅增加1.251%的情况下实现了18.421%的转化率提升,证明HSRL可作为基于强化学习的推荐系统的可扩展范式。代码已发布于https://github.com/MinmaoWang/HSRL。