We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
翻译:我们提出一个表格缩略语强化学习的算法。 我们提供了一个模块化分析,为具有连锁奖励和二次曲线约束的环境以及具有硬性约束的环境(背包)提供强有力的理论保障。 以前在限制强化学习方面所做的大部分工作都局限于线性限制,剩下的工作要么侧重于可行性问题,要么侧重于单集的设置。 我们的实验表明,拟议的算法大大超过现有受限制的单一环境的这些方法。