We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot. To achieve this, we use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control (MPC). The strategy of MPC-Net differs from many other approaches since its objective is to minimize the control Hamiltonian, which derives from the principle of optimality. To represent the policies, we employ a mixture-of-experts network (MEN) and observe that the performance of a policy improves if each expert of a MEN specializes in controlling exactly one mode of a hybrid system, such as a walking robot. We introduce new loss functions for single- and multi-gait policies to achieve this kind of expert selection behavior. Moreover, we benchmark our algorithm against Behavioral Cloning and the original MPC implementation on various rough terrain scenarios. We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
翻译:我们提出一种学习算法,用于培训一种仿照行走机器人多重轨迹的单一政策。为了实现这一点,我们使用并扩展了MPC-Net,这是一个由模型预测控制(MPC)指导的模拟学习方法。MPC-Net的战略不同于许多其他方法,因为其目的在于最大限度地减少控制汉密尔顿式的控制,这种控制来自最佳性原则。为了代表政策,我们使用一个混合专家网络(MEN),并且指出,如果一名男性专家的每一位专家都专门控制混合系统的一种模式,例如行走机器人,那么政策的执行情况就会得到改善。我们为单项和多项类政策引入新的损失功能,以实现这种专家选择行为。此外,我们把我们的算法与行为克隆和最初的MPC在各种粗野地形情景上的实施作为基准。我们验证了我们的硬件方法,并表明,一项单一的学习政策可以取代其教师来控制多种场景。