FORMIC: 利用多剂RL进行隐性通信 (ForMIC: Foraging via Multiagent RL with Implicit Communication)

from arxiv, \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Multi-agent foraging (MAF) involves distributing a team of agents to search an environment and extract resources from it. Nature provides several examples of highly effective foragers, where individuals within the foraging collective use biological markers (e.g., pheromones) to communicate critical information to others via the environment. In this work, we propose ForMIC, a distributed reinforcement learning MAF approach that endows agents with implicit communication abilities via their shared environment. However, learning efficient policies with stigmergic interactions is highly nontrivial, since agents need to perform well to send each other useful signals, but also need to sense others' signals to perform well. In this work, we develop several key learning techniques for training policies with stigmergic interactions, where such a circular dependency is present. By relying on clever curriculum learning design, action filtering, and the introduction of non-learning agents to increase the agent density at training time at low computational cost, we develop a minimal learning framework that leads to the stable training of efficient stigmergic policies. We present simulation results which demonstrate that our learned policy outperforms existing state-of-the-art MAF algorithms in a set of experiments that vary team size, number and placement of resources, and key environmental dynamics not seen at training time.

翻译：多试剂促进(MAF)工作涉及分配一个代理人员小组,以搜索环境并从中提取资源。自然提供了几个非常有效的预言员的例子,让集体使用生物标记的人(例如Pheromones)通过环境向他人传递重要信息。在这项工作中,我们建议FORMIC,一个分散的强化学习MAF方法,通过共享环境将隐含通信能力的代理人员置于隐含的通信能力中。然而,学习与平庸互动的高效政策是高度非三角的,因为代理人员需要很好地相互发送有用的信号,但也需要感知他人的信号才能很好地发挥作用。在这项工作中,我们开发了几种关键的学习技巧,在存在这种循环依赖性互动的情况下,用于培训政策的培训政策,通过智能课程的学习设计、行动过滤和引入非学习工具,在低计算成本的培训时间里提高代理人员的密度。我们开发了一个最起码的学习框架,以导致对高效的智能政策进行稳定的培训,同时需要感知他人的信号才能很好地发挥作用。我们提出模拟结果显示,我们所学的政策在智能互动互动中,在现有的环境动态动态动态模型模型中,我们所学了精明的精细的实验室规模和动态矩阵中,对关键的模型的模型的模型进行了不同的实验。