Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
翻译:近年来,深度强化学习作为传统多地形下四足动物运动解决方案的一种吸引人的替代方法而出现,其通过在物理仿真中训练策略并将其转移到实际环境中(即从仿真到现实的转换)来实现。尽管取得了相当大的进展,但传统神经网络的容量和可扩展性仍然有限,这可能会阻碍其在更复杂环境中的应用。相比之下,Transformer架构在广泛的大规模序列建模任务中展示了其优越性,包括自然语言处理和决策问题。在本文中,我们提出了地形转换器(TERT),这是一种用于在不同地形上控制四足行走的高容量Transformer模型。此外,为了更好地利用Transformer在从仿真到现实的场景中,我们提出了一种新颖的两阶段训练框架,包括离线预训练阶段和在线校正阶段,可以自然地将Transformer与特权训练集成在一起。广泛的仿真实验表明,TERT在不同地形上的回报、能量消耗和控制平滑性方面优于现有的基线。进一步的现实世界验证表明,TERT成功地穿越了九个具有挑战性的地形,其中包括沙坑和下楼梯,而强基线无法完成。