FORMIC: 利用多剂RL进行隐性通信 (ForMIC: Foraging via Multiagent RL with Implicit Communication)

Multi-agent foraging (MAF) involves distributing a team of agents to search an environment and extract resources from it. Nature provides several examples of highly effective foragers, where individuals within the foraging collective use biological markers (e.g., pheromones) to communicate critical information to others via the environment. In this work, we propose ForMIC, a distributed reinforcement learning MAF approach that endows agents with implicit communication abilities via their shared environment. However, we show that learning efficient policies with stigmergic interactions is highly nontrivial, since agents need to already perform well to send each other useful signals, but also need to sense others' signals to perform well. In this work, we develop several key learning techniques for training policies with stigmergic interactions, where such a circular dependency is present. By relying on clever curriculum learning design, action filtering, and the introduction of non-learning agents to artificially increase the agent density at training time at low computational cost, we develop a minimal learning framework that leads to the stable training of efficient stigmergic policies. We present simulation results which demonstrate that our learned policy outperforms existing state-of-the-art MAF algorithms in a set of experiments that vary team size, number and placement of resources, and key environmental dynamics not seen at training time.

翻译：多试剂开发(MAF) 包括分配一组物剂,以搜索环境并从中提取资源。自然提供了几个非常有效的预言者的例子,其中,在集体使用生物标记(例如Pheromones)中,个人通过环境向他人传递重要信息。在这项工作中,我们建议ForMIC, 一种分布式强化学习MAF方法,通过共享环境将隐含通信能力的物剂置于隐含的物剂中。然而,我们表明,在使用微小相互作用的情况下,学习高效政策非常不易,因为物剂必须已经很好地相互发送有用的信号,但也需要感知他人的信号才能很好地发挥作用。在这项工作中,我们开发了几种关键的学习技巧,在存在这种循环依赖性的情况下,用于培训政策,我们开发了几种关键的学习技巧,即学习课程设计、行动过滤以及引入非学剂,以便在低计算成本的培训时间人工增加物剂密度。我们开发了一个最起码的学习框架,从而导致对高效的物剂政策进行稳定的培训,但也需要感测别人的信号良好地发挥作用。我们用模拟结果显示,我们所学到的关键政策定位模型的模型显示,在时间上,我们所学的模型中,在研究组数组中,我们所学的模型中,我们所学的模型的模型的模型的模型显示了。