Reinforcement learning (RL) has been successfully used to solve various robotic control tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes guaranteeing stability during RL difficult is threefold: non interpretable neural network policies, unknown system dynamics and random exploration. We contribute towards solving the stable RL problem in the context of robotic manipulation that may involve physical contact with the environment. Our solution is derived from physics-based prior that originates from Lagrangian mechanics and does not involve learning any dynamics model. We show how to parameterize the resulting $\textit{energy shaping}$ policy as a deep neural network that consists of a convex potential function and a velocity dependent damping component. Our experiments, that include a real-world peg insertion task by a 7-DOF robot, validate the proposed policy structure and demonstrate the benefits of stability in RL.
翻译:强化学习 (RL) 已被成功用于解决各种机器人控制任务。 但是, 大部分现有作品并未解决控制稳定性问题。 这与控制理论界形成鲜明对比, 在控制理论界中, 成熟的规范是当控制法合成时证明稳定。 保证RL期间稳定的困难有三个方面: 无法解释的神经网络政策、 未知的系统动态和随机探索。 我们致力于在机器人操作中解决稳定的 RL 问题, 这可能涉及与环境的物理接触。 我们的解决方案来自源自Lagrangian 机械学的基于物理的先前物理的解决方案, 不涉及学习任何动态模型。 我们展示了如何将由此产生的$\ textit{ 能源制成} 政策作为深神经网络的参数化, 其中包括一个螺旋潜在功能和一个取决于速度的阻断组件。 我们的实验, 包括由7- DOF 机器人进行真实的连接插入任务, 验证拟议的政策结构, 并展示 RL 稳定性的好处 。