Modular robots can be rearranged into a new design, perhaps each day, to handle a wide variety of tasks by forming a customized robot for each new task. However, reconfiguring just the mechanism is not sufficient: each design also requires its own unique control policy. One could craft a policy from scratch for each new design, but such an approach is not scalable, especially given the large number of designs that can be generated from even a small set of modules. Instead, we create a modular policy framework where the policy structure is conditioned on the hardware arrangement, and use just one training process to create a policy that controls a wide variety of designs. Our approach leverages the fact that the kinematics of a modular robot can be represented as a design graph, with nodes as modules and edges as connections between them. Given a robot, its design graph is used to create a policy graph with the same structure, where each node contains a deep neural network, and modules of the same type share knowledge via shared parameters (e.g., all legs on a hexapod share the same network parameters). We developed a model-based reinforcement learning algorithm, interleaving model learning and trajectory optimization to train the policy. We show the modular policy generalizes to a large number of designs that were not seen during training without any additional learning. Finally, we demonstrate the policy controlling a variety of designs to locomote with both simulated and real robots.
翻译:模块机器人可以重新排列成一个新的设计, 也许每天可以重新排列, 以便通过为每个新任务组成一个定制的机器人来处理各种各样的任务。 然而, 仅仅对机制进行重新配置是不够的: 每个设计也需要自己的独特的控制政策。 人们可以从零开始为每个新设计设计设计一个政策, 但这样的方法是无法缩放的, 特别是考虑到即使小模块组也能够产生大量设计, 甚至小模块组。 相反, 我们建立一个模块化政策框架, 政策结构以硬件安排为条件, 并且仅仅使用一个培训进程来创建控制多种设计的政策。 然而, 我们的方法利用一个事实, 仅仅将一个模块化机器人的运动性能作为设计图, 并且每个模块化的模型和模块化的模块化模式化模式化。 我们开发了一个基于模型化的强化模型, 并且没有在模块化政策设计过程中学习一个大型的模型化模型。 我们开发了一个基于模型的模型化的模型化模型, 最后的模型化模型化模型化的模型化, 学习了一个大的模型化的模型化政策设计, 学习一个大的模型化的模型化的模型化的模型化的模型。