C§2:通过并行网络共同设计机器人,同时将在线和离线强化学习结合起来 (C^2:Co-design of Robots via Concurrent Networks Coupling Online and Offline Reinforcement Learning)

With the rise of computing power, using data-driven approaches for co-designing robots' morphology and controller has become a feasible way. Nevertheless, evaluating the fitness of the controller under each morphology is time-consuming. As a pioneering data-driven method, Co-adaptation utilizes a double-network mechanism with the aim of learning a Q function conditioned on morphology parameters to replace the traditional evaluation of a diverse set of candidates, thereby speeding up optimization. In this paper, we find that Co-adaptation ignores the existence of exploration error during training and state-action distribution shift during parameter transmitting, which hurt the performance. We propose the framework of the concurrent network that couples online and offline RL methods. By leveraging the behavior cloning term flexibly, we mitigate the impact of the above issues on the results. Simulation and physical experiments are performed to demonstrate that our proposed method outperforms baseline algorithms, which illustrates that the proposed method is an effective way of discovering the optimal combination of morphology and controller.

翻译：随着计算能力的提高,使用数据驱动的方法来共同设计机器人的形态和控制器已成为一种可行的方法。然而,评价每个形态下的控制器是否适合是需要时间的。作为一种先导的数据驱动方法,共同适应使用一种双网络机制,目的是学习一种以形态参数为条件的Q功能,以取代对不同候选人群的传统评价,从而加速优化。在本文中,我们发现,共同适应忽略了参数传输期间培训和状态行动分布变化期间的探索错误的存在,从而损害了性能。我们提出了同时使用的网络框架,即夫妇在网上和离线的RL方法。通过灵活地利用行为克隆术语,我们减轻上述问题对结果的影响。进行模拟和物理实验是为了证明,我们拟议的方法超越了基线算法,这说明拟议的方法是发现形态学和控制器的最佳组合的有效方法。