In this paper, we propose a locomotion training framework where a control policy and a state estimator are trained concurrently. The framework consists of a policy network which outputs the desired joint positions and a state estimation network which outputs estimates of the robot's states such as the base linear velocity, foot height, and contact probability. We exploit a fast simulation environment to train the networks and the trained networks are transferred to the real robot. The trained policy and state estimator are capable of traversing diverse terrains such as a hill, slippery plate, and bumpy road. We also demonstrate that the learned policy can run at up to 3.75 m/s on normal flat ground and 3.54 m/s on a slippery plate with the coefficient of friction of 0.22.
翻译:在本文中,我们提议了一个流动培训框架,同时培训控制政策和州测量员。框架包括一个政策网络,其中输出理想的共同立场和州估计网络,其中输出机器人的基本线性速度、脚高度和接触概率等状态的估计数。我们利用一个快速模拟环境来培训网络和训练有素的网络,将其转移到真正的机器人。经过培训的政策和州测量员能够穿越各种地形,如山坡、滑板和崎岖的道路。我们还表明,学习的政策可以在正常平坦地面上高达3.75米/秒,在滑滑板上高达3.54米/秒,摩擦系数为0.22。我们还表明,学习的政策可以在正常平坦地面上高达3.75米/秒,在滑板上高达3.54米/秒。