Deep reinforcement learning (RL) has shown promising results in the motion planning of manipulators. However, no method guarantees the safety of highly dynamic obstacles, such as humans, in RL-based manipulator control. This lack of formal safety assurances prevents the application of RL for manipulators in real-world human environments. Therefore, we propose a shielding mechanism that ensures ISO-verified human safety while training and deploying RL algorithms on manipulators. We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range. Our proposed method guarantees safety and significantly improves the RL performance by preventing episode-ending collisions. We demonstrate the performance of our proposed method in simulation using human motion capture data.
翻译:深层强化学习(RL)在操纵器的运动规划中显示出了可喜的成果。然而,在基于RL的操纵器控制中,没有方法保证高度动态障碍的安全,例如人类的安全。由于缺乏正式的安全保障,无法在现实世界人类环境中对操纵器适用RL。因此,我们建议建立一个屏蔽机制,在对操纵器进行培训和部署RL算法的同时,确保标准化组织验证的人的安全。我们利用对人和操纵器的快速可达性分析,保证操纵器在人类在其范围内之前完全停止。我们提议的方法保证安全,通过防止周期性碰撞大大改进RL的性能。我们展示了我们提议的模拟使用人类运动捕获数据的方法的性能。