The intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning. We introduce Deep Evolutionary Reinforcement Learning (DERL): a novel computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments using only low level egocentric sensory information. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the lifetime of their descendants. In agents that learn and evolve in complex environments, this result constitutes the first demonstration of a long-conjectured morphological Baldwin effect. Third, our experiments suggest a mechanistic basis for both the Baldwin effect and the emergence of morphological intelligence through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.
翻译:复杂的环境环境背景的学习和演变过程相互交织,产生了显著的形态形式多样性;此外,动物情报的许多方面都深入地体现在这些演变形态中;然而,管理环境复杂性、演变形态学和智能控制可学习性之间关系的原则仍然难以实现,部分原因是在进化和学习方面进行大规模硅实验的巨大挑战;我们引入了深进化强化学习(DERL):一种新型的计算框架,这种框架可以演变出不同的媒介形态,以学习在复杂环境中具有挑战性的移动和操纵任务,仅使用低层次的自我中心感官信息;DERL的利用显示了环境复杂性、形态学智能和可控制性之间的若干关系;首先,环境复杂性促进了形态学的演变,通过形态学的能力加以量化,便利学习新任务的学习;第二,迅速选择了学习速度更快的形态,从而使得早期祖先一生深夜学到的行为能够在其后代的一生早期表现为具有挑战性的移动和操纵。在复杂的环境中学习和演进的代理人中,这种结果形成了一种稳定的物理进化和进化的物理进化的进化。