Inspired by organisms evolving through cooperation and competition between different populations on Earth, we study the emergence of artificial collective intelligence through massive-agent reinforcement learning. To this end, We propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. In Lux, we build our agents through the standard reinforcement learning algorithm in curriculum learning phases and leverage centralized control via a pixel-to-pixel policy network. As agents co-evolve through self-play, we observe several stages of intelligence, from the acquisition of atomic skills to the development of group strategies. Since these learned group strategies arise from individual decisions without an explicit coordination mechanism, we claim that artificial collective intelligence emerges from massive-agent cooperation and competition. We further analyze the emergence of various learned strategies through metrics and ablation studies, aiming to provide insights for reinforcement learning implementations in massive-agent environments.
翻译:在生物体的激励下,通过地球上不同人群之间的合作和竞争,我们研究人工集体情报的出现,通过大规模试剂强化学习,为此,我们提议一个新的大规模试剂强化学习环境,即卢克斯,在卢克斯,两个团队中的动态和大规模代理商争夺有限资源,与黑暗作斗争。在卢克斯,我们通过标准强化学习算法在课程学习阶段建立自己的代理商,并通过像素到像素政策网络利用集中控制。作为通过自我游戏共同演进的代理商,我们观察从获得原子技能到制定集体战略等几个情报阶段。由于这些学习集体战略产生于个别决定,而没有明确的协调机制,我们声称人工集体情报产生于大规模代理合作和竞争。我们进一步分析通过指标和减缩研究出现的各种学习战略,目的是提供见解,以加强大规模试剂环境中的学习实施。