Humanoid robots promise transformative capabilities for industrial and service applications. While recent advances in Reinforcement Learning (RL) yield impressive results in locomotion, manipulation, and navigation, the proposed methods typically require enormous simulation samples to account for real-world variability. This work proposes a novel one-stage training framework-Learn to Teach (L2T)-which unifies teacher and student policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time while achieving state-of-the-art performance. Furthermore, we validate the RL variant (L2T-RL) through extensive simulations and hardware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ challenging terrains without depth estimation modules.
翻译:人形机器人在工业与服务应用领域展现出变革性潜力。尽管强化学习(RL)在运动控制、操作与导航方面取得了显著进展,现有方法通常需要海量仿真样本以应对现实世界的复杂性。本研究提出一种新颖的单阶段训练框架——学习教学(L2T),该框架将教师策略与学生策略学习进行统一。我们的方法通过共享动力学机制实现仿真样本的循环利用与学习轨迹的同步优化,在保持最先进性能的同时显著降低了样本复杂度与训练时间。此外,我们通过Digit机器人上的大量仿真实验与硬件测试验证了强化学习变体(L2T-RL),证明了其在无需深度估计模块的情况下,能够实现零样本仿真到现实的迁移,并在12种以上复杂地形中保持鲁棒性能。