Many state-of-the art robotic applications utilize series elastic actuators (SEAs) with closed-loop force control to achieve complex tasks such as walking, lifting, and manipulation. Model-free PID control methods are more prone to instability due to nonlinearities in the SEA where cascaded model-based robust controllers can remove these effects to achieve stable force control. However, these model-based methods require detailed investigations to characterize the system accurately. Deep reinforcement learning (DRL) has proved to be an effective model-free method for continuous control tasks, where few works deal with hardware learning. This paper describes the training process of a DRL policy on hardware of an SEA pendulum system for tracking force control trajectories from 0.05 - 0.35 Hz at 50 N amplitude using the Proximal Policy Optimization (PPO) algorithm. Safety mechanisms are developed and utilized for training the policy for 12 hours (overnight) without an operator present within the full 21 hours training period. The tracking performance is evaluated showing improvements of $25$ N in mean absolute error when comparing the first 18 min. of training to the full 21 hours for a 50 N amplitude, 0.1 Hz sinusoid desired force trajectory. Finally, the DRL policy exhibits better tracking and stability margins when compared to a model-free PID controller for a 50 N chirp force trajectory.
翻译:许多最先进的机器人应用程序利用具有闭环力控制的串级弹性致动器(SEA)来实现诸如步行、举起和操作等复杂任务。模型自由的PID控制方法更容易受到SEA中的非线性影响,而级联的基于模型的鲁棒控制器可以消除这些影响,以实现稳定的力控制。然而,这些基于模型的方法需要详细的研究,以准确地表征系统。深度强化学习(DRL)已经证明是一种有效的连续控制任务的无模型方法,其中少数工作涉及到硬件学习。本文描述了在SE阻尼系统的硬件上,利用PPO算法训练DRL策略的过程,用于跟踪力控制轨迹从0.05-0.35 Hz,50 N振幅的范围内。开发和使用安全机制,让策略在12小时(整个21小时的培训期)内进行培训,无需操作员。评估跟踪性能时,当比较首次18分钟和完整的21小时的50N振幅,0.1 Hz正弦期望力轨迹时,平均绝对误差有25 N的改善。最后,将DRL策略与无模型PID控制器进行比较,发现其对于50 N啁啾力轨迹具有更好的跟踪和稳定边界。