Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship between preindustrial income levels and population growth. Malthusian reinforcement learning harnesses the competitive pressures arising from growing and shrinking population size to drive agents to explore regions of state and policy spaces that they could not otherwise reach. Furthermore, in environments where there are potential gains from specialization and division of labor, we show that Malthusian reinforcement learning is better positioned to take advantage of such synergies than algorithms based on self-play.
翻译:在这里,我们探索了一个新的多试剂强化学习算法框架,称为马尔图西亚强化学习,它扩大了自我游戏的范围,包括了与健康相关的人口规模动态,从而推动不断的创新。 在马尔图西亚地区,亚人口的平均回报驱动力随之增加,其规模也随之增加,正如托马斯·马尔图斯在1798年指出的,工业前收入水平与人口增长之间的关系。 马尔图西亚强化学习利用了人口规模不断增长和缩小所带来的竞争压力,促使代理商探索他们无法以其他方式达到的国家和政策空间区域。 此外,在从专业化和分工中可能获益的环境下,我们表明马尔图西亚强化学习比基于自我游戏的算法更有能力利用这种协同作用。