A robot's ability to complete a task is heavily dependent on its physical design. However, identifying an optimal physical design and its corresponding control policy is inherently challenging. The freedom to choose the number of links, their type, and how they are connected results in a combinatorial design space, and the evaluation of any design in that space requires deriving its optimal controller. In this work, we present N-LIMB, an efficient approach to optimizing the design and control of a robot over large sets of morphologies. Central to our framework is a universal, design-conditioned control policy capable of controlling a diverse sets of designs. This policy greatly improves the sample efficiency of our approach by allowing the transfer of experience across designs and reducing the cost to evaluate new designs. We train this policy to maximize expected return over a distribution of designs, which is simultaneously updated towards higher performing designs under the universal policy. In this way, our approach converges towards a design distribution peaked around high-performing designs and a controller that is effectively fine-tuned for those designs. We demonstrate the potential of our approach on a series of locomotion tasks across varying terrains and show the discovery novel and high-performing design-control pairs.
翻译:机器人完成某项任务的能力在很大程度上取决于其物理设计。 然而, 确定最佳物理设计及其相应的控制政策本身就具有挑战性。 选择连接数量、 类型和如何在组合设计空间中产生连接结果的自由性, 以及对该空间中任何设计的评价需要其最佳控制器。 在这项工作中, 我们提出N- LIMB, 一种优化机器人设计和控制大型形态的高效方法。 我们框架的核心是一种通用的、 设计上有条件的控制政策, 能够控制不同的设计。 这一政策通过允许跨设计转让经验并降低评估新设计的成本,极大地提高了我们方法的样本效率。 我们培训这一政策, 以便在设计分布上最大限度地实现预期的回报, 同时在通用政策下进行更新, 以达到更高的性能设计。 这样, 我们的方法就趋向于设计分布达到顶峰, 围绕高性的设计, 以及一个能够有效调整这些设计的设计控制器。 我们展示了我们在一系列不同地形的移动任务上的方法的潜力, 并展示了发现的新式和高性设计组合。