Neuroevolution (NE) has recently proven a competitive alternative to learning by gradient descent in reinforcement learning tasks. However, the majority of NE methods and associated simulation environments differ crucially from biological evolution: the environment is reset to initial conditions at the end of each generation, whereas natural environments are continuously modified by their inhabitants; agents reproduce based on their ability to maximize rewards within a population, while biological organisms reproduce and die based on internal physiological variables that depend on their resource consumption; simulation environments are primarily single-agent while the biological world is inherently multi-agent and evolves alongside the population. In this work we present a method for continuously evolving adaptive agents without any environment or population reset. The environment is a large grid world with complex spatiotemporal resource generation, containing many agents that are each controlled by an evolvable recurrent neural network and locally reproduce based on their internal physiology. The entire system is implemented in JAX, allowing very fast simulation on a GPU. We show that NE can operate in an ecologically-valid non-episodic multi-agent setting, finding sustainable collective foraging strategies in the presence of a complex interplay between ecological and evolutionary dynamics.
翻译:神经革命(NE)最近证明,在强化学习任务中,作为梯度下降学习的一种有竞争力的替代方法,最近被证明是代替梯度下降学习的竞争性替代方法,然而,大部分NE方法和相关的模拟环境与生物演变截然不同:环境在每代人末端被重新设置为初始条件,而自然环境则不断被其居民改变;物剂根据其在人口内部获得最大收益的能力而繁殖,而生物有机体根据取决于其资源消耗的内部生理变量而繁殖和死亡;模拟环境主要是单一试剂,而生物世界本身是多剂,与人口一起演变。在这项工作中,我们提出了一个在没有任何环境或人口重新设置的情况下不断发展适应剂的方法。环境是一个庞大的电网世界,它拥有复杂的波时速资源生成,其中有许多物剂都由可变的经常性神经网络控制,并且根据内部生理学在当地繁殖。整个系统在JAX实施,允许对GPU进行非常快速的模拟。我们表明,NE可以在一个生态价值的非先导型多试剂环境下运作,在生态和进动态之间复杂的相互作用中找到可持续的集体战略。