An agent's functionality is largely determined by its design, i.e., skeletal structure and joint attributes (e.g., length, size, strength). However, finding the optimal agent design for a given function is extremely challenging since the problem is inherently combinatorial and the design space is prohibitively large. Additionally, it can be costly to evaluate each candidate design which requires solving for its optimal controller. To tackle these problems, our key idea is to incorporate the design procedure of an agent into its decision-making process. Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design. To handle a variable number of joints across designs, we use a graph-based policy where each graph node represents a joint and uses message passing with its neighbors to output joint-specific actions. Using policy gradient methods, our approach enables first-order optimization of agent design and control as well as experience sharing across different designs, which improves sample efficiency tremendously. Experiments show that our approach, Transform2Act, outperforms prior methods significantly in terms of convergence speed and final performance. Notably, Transform2Act can automatically discover plausible designs similar to giraffes, squids, and spiders. Our project website is at https://sites.google.com/view/transform2act.
翻译:代理商的功能主要由其设计决定, 即骨骼结构和联合属性( 如长度、 大小、 强度) 。 然而, 找到一个特定函数的最佳代理商设计极具挑战性, 因为问题本质上是组合性的, 设计空间太大, 令人难以接受。 此外, 评估每个需要解决其最佳控制器的候选设计, 要解决这些问题, 我们的关键想法是将代理商的设计程序纳入其决策过程。 具体地说, 我们学习了一项有条件的政策, 在一个插件中, 首先应用一系列变换行动来修改代理商的骨骼结构和联合属性, 然后在新设计中应用控制行动。 要处理不同设计之间可变的联合数量, 我们使用基于图表的政策, 每个图点代表一个联合, 并使用信息传递到其邻居的输出特定行动。 使用政策梯度方法, 我们的方法能够将代理商设计和控制的第一阶优化, 以及在不同设计中共享经验, 大大提高了效率。 实验显示, 我们的方法, 变换2 ADA, 超越了我们之前的变压系统, 和变压前的变压式网站的系统化方法, 能够大大地将 我们的变压前的变压前的系统 。