Reinforcement Learning (RL) agents have great successes in solving tasks with large observation and action spaces from limited feedback. Still, training the agents is data-intensive and there are no guarantees that the learned behavior is safe and does not violate rules of the environment, which has limitations for the practical deployment in real-world scenarios. This paper discusses the engineering of reliable agents via the integration of deep RL with constraint-based augmentation models to guide the RL agent towards safe behavior. Within the constraints set, the RL agent is free to adapt and explore, such that its effectiveness to solve the given problem is not hindered. However, once the RL agent leaves the space defined by the constraints, the outside models can provide guidance to still work reliably. We discuss integration points for constraint guidance within the RL process and perform experiments on two case studies: a strictly constrained card game and a grid world environment with additional combinatorial subgoals. Our results show that constraint-guidance does both provide reliability improvements and safer behavior, as well as accelerated training.
翻译:强化学习(RL)代理商在通过有限的反馈提供大量的观测和行动空间来完成任务方面取得了巨大成功。不过,培训代理商是数据密集型的,不能保证所学行为是安全的,不会违反环境规则,因为环境规则对现实世界情景的实际部署有局限性。本文讨论了通过将深度RL与基于限制的增强模型相结合,引导RL代理商走向安全行为的可靠代理商工程。在规定的限制范围内,RL代理商可以自由调整和探索,以便其解决特定问题的效力不会受到阻碍。然而,一旦RL代理商离开受限制的空间,外部模型可以提供仍然可靠工作的指导。我们讨论了在RL过程中限制指导的整合点,并进行了两个案例研究的实验:严格受限制的纸牌游戏和电网世界环境,还有额外的组合子目标。我们的结果表明,制约指导既能提高可靠性,又能提高行为安全,还能加速培训。