The process of robot design is a complex task and the majority of design decisions are still based on human intuition or tedious manual tuning. A more informed way of facing this task is computational design methods where design parameters are concurrently optimized with corresponding controllers. Existing approaches, however, are strongly influenced by predefined control rules or motion templates and cannot provide end-to-end solutions. In this paper, we present a design optimization framework using model-free meta reinforcement learning, and its application to the optimizing kinematics and actuator parameters of quadrupedal robots. We use meta reinforcement learning to train a locomotion policy that can quickly adapt to different designs. This policy is used to evaluate each design instance during the design optimization. We demonstrate that the policy can control robots of different designs to track random velocity commands over various rough terrains. With controlled experiments, we show that the meta policy achieves close-to-optimal performance for each design instance after adaptation. Lastly, we compare our results against a model-based baseline and show that our approach allows higher performance while not being constrained by predefined motions or gait patterns.
 翻译:机器人设计过程是一项复杂的任务,大多数设计决定仍然基于人类直觉或烦琐的手工调试。更知情的应对这项任务的方法是计算设计方法,其中设计参数与相应的控制器同时优化。但是,现有方法受到预先定义的控制规则或运动模板的强烈影响,无法提供端到端解决方案。在本文件中,我们提出了一个设计优化框架,使用无模型元强化学习,并将其应用到四重机器人最优化的动力学和动作学参数中。我们使用元强化学习来培训能够迅速适应不同设计的移动式政策。在设计优化期间,该政策用于评价每个设计实例。我们证明,该政策可以控制不同设计的机器人,以追踪各种粗野地形的随机速度指令。我们通过受控的实验,显示每个设计实例在适应后都能实现接近最佳的性能。最后,我们将我们的成果与基于模型的基线进行比较,并表明我们的方法允许更高的性能,同时不受预先定义的动作或组合模式的限制。