This work developed a meta-learning approach that adapts the control policy on the fly to different changing conditions for robust locomotion. The proposed method constantly updates the interaction model, samples feasible sequences of actions of estimated the state-action trajectories, and then applies the optimal actions to maximize the reward. To achieve online model adaptation, our proposed method learns different latent vectors of each training condition, which are selected online given the newly collected data. Our work designs appropriate state space and reward functions, and optimizes feasible actions in an MPC fashion which are then sampled directly in the joint space considering constraints, hence requiring no prior design of specific walking gaits. We further demonstrate the robot's capability of detecting unexpected changes during interaction and adapting control policies quickly. The extensive validation on the SpotMicro robot in a physics simulation shows adaptive and robust locomotion skills under varying ground friction, external pushes, and different robot models including hardware faults and changes.
翻译:这项工作开发了一种元学习方法, 使飞行控制政策适应各种变化的稳健移动条件。 提议的方法不断更新互动模式, 抽样评估州- 动作轨迹的可行行动序列, 然后运用最佳行动来最大限度地获得奖励。 为了实现在线模式适应, 我们的拟议方法学习了每个培训条件的不同潜在矢量, 并根据新收集的数据在网上选择。 我们的工作设计了适当的国家空间和奖励功能, 并优化了以MPC方式采取的可行行动, 然后在联合空间直接抽样, 以考虑制约因素, 从而不需要事先设计特定的行步步步曲。 我们进一步展示了机器人在互动和快速调整控制政策期间探测出乎意料的变化的能力。 在物理模拟中对SpotMicro机器人的广泛验证显示了在各种地面摩擦、 外部推力和包括硬件缺陷和变化在内的不同机器人模型下的适应性和稳健的移动能力。