Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings challenge standard RL algorithms, which struggle to scale, generalize, and exploit structure in the presence of combinatorial action spaces. We propose Structured Reinforcement Learning (SRL), a novel actor-critic paradigm that embeds combinatorial optimization-layers into the actor neural network. We enable end-to-end learning of the actor via Fenchel-Young losses and provide a geometric interpretation of SRL as a primal-dual algorithm in the dual of the moment polytope. Across six environments with exogenous and endogenous uncertainty, SRL matches or surpasses the performance of unstructured RL and imitation learning on static tasks and improves over these baselines by up to 92% on dynamic problems, with improved stability and convergence speed.
翻译:强化学习(RL)正日益应用于涉及复杂结构化决策的现实问题,如路径规划、调度和品类规划。这些场景对标准RL算法提出了挑战,因为后者在组合动作空间存在时难以扩展、泛化并有效利用结构。我们提出了结构化强化学习(SRL),这是一种新颖的演员-评论家范式,将组合优化层嵌入演员神经网络中。通过Fenchel-Young损失函数,我们实现了演员的端到端学习,并将SRL几何解释为对偶矩多面体中的原始-对偶算法。在包含外生与内生不确定性的六个环境中,SRL在静态任务上匹配或超越了非结构化RL和模仿学习的性能,并在动态问题上较这些基线提升了高达92%,同时表现出更好的稳定性和收敛速度。