We consider communication in a fully cooperative multi-agent system, where the agents have partial observation of the environment and must act jointly to maximize the overall reward. We have a discrete-time queueing network where agents route packets to queues based only on the partial information of the current queue lengths. The queues have limited buffer capacity, so packet drops happen when they are sent to a full queue. In this work, we implemented a communication channel for the agents to share their information in order to reduce the packet drop rate. For efficient information sharing we use an attention-based communication model, called ATVC, to select informative messages from other agents. The agents then infer the state of queues using a combination of the variational auto-encoder, VAE, and product-of-experts, PoE, model. Ultimately, the agents learn what they need to communicate and with whom, instead of communicating all the time with everyone. We also show empirically that ATVC is able to infer the true state of the queues and leads to a policy which outperforms existing baselines.
翻译:我们考虑的是完全合作的多试剂系统中的通信,在这个系统中,代理商对环境进行部分观测,并且必须共同采取行动,以获得最大限度的总体奖励。我们有一个离散时间排队网络,代理商只根据当前队列长度的部分信息将包排成队列。队列的缓冲容量有限,因此当他们被送入完整队列时,会发生包滴。在这项工作中,我们为代理商建立了一个通信渠道,以便共享信息,从而降低包落率。为了高效的信息共享,我们使用一个基于关注的通信模式,即ATVC,从其他代理商中选择信息信息信息。然后,代理商使用变式自动编码器、VAE和专家产品、PoE、模型的组合来推断排队列状态。最终,代理商们知道他们需要的通信内容和与谁沟通,而不是与所有人随时沟通。我们还从经验上表明,ATVC能够推断出排队列的真实状况,并导致一项超越现有基线的政策。