Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, unconditional stability is obtained by deriving an interpretable deep policy structure based on the $\textit{energy shaping}$ control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on $\textit{passivity}$. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.
翻译:深度强化学习( DRL) 已被成功用于解决各种机器人操纵任务。 但是, 大部分现有作品并未解决控制稳定性问题。 这与控制理论界形成鲜明对比, 后者的既定规范就是在综合控制法时证明稳定。 传统稳定分析对于DRL来说很困难的是神经网络政策的不可解释性质和未知的系统动态。 在这项工作中, 无条件稳定是通过基于对拉格兰江系统控制$\textit{ 能源成形$的可解释的深度政策结构获得的。 然后, 在与未知环境进行物理互动的过程中, 以$\ textit{ 被动度$建立稳定。 结果是在无模式框架内保证DRL的稳定性, 这个框架对于接触丰富的操纵任务来说是十分普遍的。 在一项“ 网眼任务” 实验中, 我们根据我们的知识, 展示了第一个具有真正机器人操纵者稳定性保证的DRL 。