Understanding the gap between simulation and reality is critical for reinforcement learning with legged robots, which are largely trained in simulation. However, recent work has resulted in sometimes conflicting conclusions with regard to which factors are important for success, including the role of dynamics randomization. In this paper, we aim to provide clarity and understanding on the role of dynamics randomization in learning robust locomotion policies for the Laikago quadruped robot. Surprisingly, in contrast to prior work with the same robot model, we find that direct sim-to-real transfer is possible without dynamics randomization or on-robot adaptation schemes. We conduct extensive ablation studies in a sim-to-sim setting to understand the key issues underlying successful policy transfer, including other design decisions that can impact policy robustness. We further ground our conclusions via sim-to-real experiments with various gaits, speeds, and stepping frequencies. Additional Details: https://www.pair.toronto.edu/understanding-dr/.
翻译:理解模拟与现实之间的差距对于与大都经过模拟培训的腿型机器人加强学习至关重要。然而,最近的工作有时导致对哪些因素,包括动态随机化的作用,对于哪些因素对成功非常重要,包括动态随机化的作用,得出了相互矛盾的结论。在本文件中,我们旨在澄清和理解动态随机化在学习拉伊卡戈四重机器人强力移动政策中的作用。与以前对同一机器人模型的工作相比,我们惊讶地发现,直接模拟到实际的转移在没有动态随机化或机器人适应计划的情况下是可能的。我们在模拟到模拟环境中进行了广泛的模拟研究,以了解成功政策转移的关键问题,包括可能影响政策稳健性的其他设计决定。我们进一步通过以各种格子、速度和阶梯频率进行模拟到现实的实验来得出我们的结论。其他详情见:https://www.pair.torontotototo.edu/underadid-dr/。