We introduce a novel co-design method for autonomous moving agents' shape attributes and locomotion by combining deep reinforcement learning and evolution with user control. Our main inspiration comes from evolution, which has led to wide variability and adaptation in Nature and has the potential to significantly improve design and behavior simultaneously. Our method takes an input agent with optional simple constraints such as leg parts that should not evolve or allowed ranges of changes. It uses physics-based simulation to determine its locomotion and finds a behavior policy for the input design, later used as a baseline for comparison. The agent is then randomly modified within the allowed ranges creating a new generation of several hundred agents. The generation is trained by transferring the previous policy, which significantly speeds up the training. The best-performing agents are selected, and a new generation is formed using their crossover and mutations. The next generations are then trained until satisfactory results are reached. We show a wide variety of evolved agents, and our results show that even with only 10% of changes, the overall performance of the evolved agents improves 50%. If more significant changes to the initial design are allowed, our experiments' performance improves even more to 150%. Contrary to related work, our co-design works on a single GPU and provides satisfactory results by training thousands of agents within one hour.
翻译:我们为自主移动物剂的形状属性和动作设计一种新颖的共同设计方法,将深度强化学习和进化与用户控制相结合。 我们的主要灵感来自进化,这导致自然的变异和适应性大发,并有可能同时大大改进设计和行为。 我们的方法采用一种可选的简单限制的输入剂,例如腿部部分,不应演化或允许变化范围。 我们用物理模拟来确定其运动和为输入设计找到一种行为政策, 以后用作比较的基准。 然后在允许的范围内随机修改该剂, 创造新一代几百种物剂。 培养该剂的方法是通过转让先前的政策, 大大加快了培训速度。 选择了最优秀的物剂, 并且利用它们的交叉和突变来形成新一代。 然后在达到令人满意的结果之前培训下一代。 我们展示了各种各样的进化物剂, 并且我们的结果显示, 即使只有10%的变化, 进化物剂的总体性能也提高了50 %。 如果允许对初始设计进行更显著的改变, 我们的实验性能将提高到150 % 。 相反, 我们的工时数小时的工程将使得一个令人满意的G 。