Sim-to-real is a mainstream method to cope with the large number of trials needed by typical deep reinforcement learning. However, transferring a policy trained in simulation to actual hardware remains challenging due to the reality gap. In particular, the characteristics of actuators in legged robots have a considerable influence on sim-to-real transfer. High reduction ratio gears are widely used in actuators, and the reality gap issue becomes especially pronounced when even the utilization of backdrivability is considered to control joints compliantly. We propose a new simulation model of gears to address this gap. Additionally, the difficulty in achieving stable bipedal locomotion causes typical methods to fail to tune physical parameters in simulation with the behavior of transferred policy. Thus, we propose a method for system identification that can utilize failed attempts. The method's effectiveness is verified using a biped robot, the ROBOTIS-OP3, and the sim-to-real transferred policy can stabilize the robot under severe disturbances and walk on uneven surfaces without force and torque sensors.
翻译:超现实是应对典型的深度增援学习所需的大量试验的一种主流方法。然而,由于现实差距,将经过模拟训练的政策转换为实际硬件仍具有挑战性。特别是,脚下机器人的导师特点对模拟到实际转移具有相当大的影响。高降比齿轮在驱动器中被广泛使用,而且当甚至利用后推进能力来控制接合时,实际差距问题也变得特别明显。我们提议了一种新的齿轮模拟模型来弥补这一差距。此外,实现稳定的双层移动模型的难度导致在模拟中无法使物理参数与转移政策的行为相调的典型方法。因此,我们提出了一种系统识别方法,可以利用失败的尝试。该方法的有效性通过双向机器人、ROBOTIS-OP3和Sim-to-real转移的政策来验证,可以稳定机器人在严重干扰下,在没有武力和感应感应器的情况下在不稳定的表面行走。