Sim-to-real is a mainstream method to cope with the large number of trials needed by typical deep reinforcement learning methods. However, transferring a policy trained in simulation to actual hardware remains an open challenge due to the reality gap. In particular, the characteristics of actuators in legged robots have a considerable influence on sim-to-real transfer. There are two challenges: 1) High reduction ratio gears are widely used in actuators, and the reality gap issue becomes especially pronounced when backdrivability is considered in controlling joints compliantly. 2) The difficulty in achieving stable bipedal locomotion causes typical system identification methods to fail to sufficiently transfer the policy. For these two challenges, we propose 1) a new simulation model of gears and 2) a method for system identification that can utilize failed attempts. The method's effectiveness is verified using a biped robot, the ROBOTIS-OP3, and the sim-to-real transferred policy can stabilize the robot under severe disturbances and walk on uneven surfaces without using force and torque sensors.
翻译:模拟到实际场景的转移已成为面对典型的深度强化学习方法所需的大量尝试的主要方法。然而,由于现实差距,将在模拟中训练的策略转移到实际硬件仍然是一个未解决的难题。特别是,腿部机器人的执行器特性对于模拟到实际的转移具有相当大的影响。存在两个主要挑战:1)高减速比齿轮广泛用于执行器,当考虑使用主动关节控制时,现实差距问题更加突出。2)难以实现稳定的双足运动,导致典型的系统识别方法无法充分转移策略。针对这两个挑战,我们提出了1)一种新的齿轮模型,以及2)一种可以利用失败尝试的系统识别方法。使用双足机器人ROBOTIS-OP3验证了该方法的有效性,策略的模拟到实际转移可以在严重干扰下稳定机器人,并在不使用力和扭矩传感器的情况下在不平坦的地面上行走。