Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.
翻译:深度强化学习(DRL)在许多高维状态和/或行动空间的自主系统的复杂决策任务中取得了巨大成功。然而,安全和稳定性仍然是阻碍DRL应用于安全关键自主系统的主要问题。为解决这个问题,我们提出了Phy-DRL:一种物理深度强化学习框架。Phy-DRL在两个架构设计上具有创新性:i)类Lyapunov奖励,ii)残余控制(即物理模型控制和数据驱动控制的集成)。并发的物理奖励和残余控制使Phy-DRL具有(数学上)可证明的安全性和稳定性保证。通过倒立摆的实验,我们展示了Phy-DRL具有保证安全和稳定,增强鲁棒性,同时提供了明显加速的训练和扩大的奖励。