This study is aimed at addressing the problem of fault tolerance of quadruped robots to actuator failure, which is critical for robots operating in remote or extreme environments. In particular, an adaptive curriculum reinforcement learning algorithm with dynamics randomization (ACDR) is established. The ACDR algorithm can adaptively train a quadruped robot in random actuator failure conditions and formulate a single robust policy for fault-tolerant robot control. It is noted that the hard2easy curriculum is more effective than the easy2hard curriculum for quadruped robot locomotion. The ACDR algorithm can be used to build a robot system that does not require additional modules for detecting actuator failures and switching policies. Experimental results show that the ACDR algorithm outperforms conventional algorithms in terms of the average reward and walking distance.
翻译:这项研究旨在解决四重机器人的过错容忍问题,这对在偏远或极端环境中操作的机器人来说至关重要。 特别是, 建立了适应性课程强化动态随机化学习算法(ACDR) 。 ACDR 算法可以在随机驱动器失灵条件下对四重机器人进行适应性培训,并制定一项单一的稳健政策来控制错错失机器人。 人们注意到, 硬易操作课程比四重机器人移动的简单硬课程更为有效。 ACDR 算法可以用来建立一个不需要额外模块来检测动作失败和转换政策的机器人系统。 实验结果表明,ACDR 算法在平均报酬和行走距离方面优于常规算法。