To make a modular robotic system both capable and scalable, the controller must be equally as modular as the mechanism. Given the large number of designs that can be generated from even a small set of modules, it becomes impractical to create a new system-wide controller for each design. Instead, we construct a modular control policy that handles a broad class of designs. We take the view that a module is both form and function, i.e. both mechanism and controller. As the modules are physically re-configured, the policy automatically re-configures to match the kinematic structure. This novel policy is trained with a new model-based reinforcement learning algorithm, which interleaves model learning and trajectory optimization to guide policy learning for multiple designs simultaneously. Training the policy on a varied set of designs teaches it how to adapt its behavior to the design. We show that the policy can then generalize to a larger set of designs not seen during training. We demonstrate one policy controlling many designs with different combinations of legs and wheels to locomote both in simulation and on real robots.
翻译:要使模块式机器人系统既有能力又可扩缩,控制器必须和机制一样具有模块式。鉴于即使是小模块也可以产生大量设计,因此为每个设计创建一个新的全系统控制器变得不切实际。相反,我们设计了一个模块式控制政策,处理广泛的设计类别。我们认为模块既具有形式又具有功能,即机制和控制器。随着模块的物理重新配置,政策必须自动重新配置,以与运动结构相匹配。这个新政策受到基于模型的新型强化学习算法的培训,该算法可以同时将模型式学习与轨迹优化结合起来,以指导多重设计的政策学习。对不同设计式的政策进行培训,教它如何使其行为适应设计。我们表明,该政策随后可以概括为在培训中看不到的更大系列设计。我们展示了一种政策,用不同的腿和轮子组合来控制许多设计,以便在模拟中和真实机器人上进行隐形。