利用解析梯度在可证明安全强化学习中的研究 (Leveraging Analytic Gradients in Provably Safe Reinforcement Learning)

The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance. Additional visuals are provided at \href{https://timwalter.github.io/safe-agb-rl.github.io}{timwalter.github.io/safe-agb-rl.github.io}.

翻译：在安全关键应用中部署自主机器人需要提供安全保证。可证明安全强化学习是一个活跃的研究领域，旨在通过安全防护机制提供此类保证。这些防护机制应在训练过程中集成，以减少仿真到现实的差距。虽然已有多种方法用于保护基于采样的强化学习，但基于解析梯度的强化学习方法通常能以更少的环境交互获得更优的性能。然而，目前尚缺乏针对该学习范式的防护方法。本研究通过开发首个适用于基于解析梯度的强化学习的有效防护机制，填补了这一空白。我们分析了现有的可微分防护方法，通过改进的映射和梯度公式对其进行适配，并将其集成到最先进的学习算法和可微分仿真环境中。通过在三个控制任务上的数值实验，我们评估了不同防护机制对学习过程的影响。结果表明，在实施防护训练的同时并未牺牲性能。更多可视化内容请访问 \href{https://timwalter.github.io/safe-agb-rl.github.io}{timwalter.github.io/safe-agb-rl.github.io}。