Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning in real-world decision making applications. In particular, we require algorithms that can guarantee robust, safe performance in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In this work, we propose a safe reinforcement learning framework with robustness guarantees through the use of an optimal transport cost uncertainty set. We provide an efficient, theoretically supported implementation based on Optimal Transport Perturbations, which can be applied in a completely offline fashion using only data collected in a nominal training environment. We demonstrate the robust, safe performance of our approach on a variety of continuous control tasks with safety constraints in the Real-World Reinforcement Learning Suite.
翻译:强健和安全对于在现实世界的决策应用中可靠地部署深强化学习至关重要,特别是,我们需要在一般环境动荡的情况下能够保证稳健和安全运行的算法,同时对培训期间的数据收集过程作出有限的假设;在这项工作中,我们提议一个安全强化学习框架,通过使用最佳运输成本不确定性套件,保证稳健可靠;我们根据最佳运输风险套件,提供高效、理论上支持的理论实施,该套套件只能使用在名义培训环境中收集的数据,可以完全离线地应用。我们展示了我们在现实世界强化学习套件中各种持续控制任务方面,在安全限制下,采取了有力、安全的方法。