促进多机构合作的重力独立强化学习 (Stigmergic Independent Reinforcement Learning for Multi-Agent Collaboration)

With the rapid evolution of wireless mobile devices, there emerges an increased need to design effective collaboration mechanisms between intelligent agents, so as to gradually approach the final collective objective through continuously learning from the environment based on their individual observations. In this regard, independent reinforcement learning (IRL) is often deployed in multi-agent collaboration to alleviate the problem of a non-stationary learning environment. However, behavioral strategies of intelligent agents in IRL can only be formulated upon their local individual observations of the global environment, and appropriate communication mechanisms must be introduced to reduce their behavioral localities. In this paper, we address the problem of communication between intelligent agents in IRL by jointly adopting mechanisms with two different scales. For the large scale, we introduce the stigmergy mechanism as an indirect communication bridge between independent learning agents, and carefully design a mathematical method to indicate the impact of digital pheromone. For the small scale, we propose a conflict-avoidance mechanism between adjacent agents by implementing an additionally embedded neural network to provide more opportunities for participants with higher action priorities. In addition, we present a federal training method to effectively optimize the neural network of each agent in a decentralized manner. Finally, we establish a simulation scenario in which a number of mobile agents in a certain area move automatically to form a specified target shape. Extensive simulations demonstrate the effectiveness of our proposed method.

翻译：随着无线移动装置的迅速演变,人们越来越需要设计智能剂之间有效的协作机制,以便通过不断从环境中学习,根据各自的观察,逐步地从环境中学习,从而逐步接近最终的集体目标。在这方面,独立强化学习(IRL)往往是在多剂协作中部署的,以缓解非静止学习环境的问题。然而,IRL智能剂的行为战略只能根据它们各自对全球环境的当地观察来制定,并且必须采用适当的通信机制来减少它们的行为地点。在本文件中,我们通过两个不同的尺度联合采用机制来解决IRL智能剂之间的沟通问题。在大规模上,我们采用吸附机制作为独立学习剂之间的间接沟通桥梁,并仔细设计一个数学方法来表明数字光质素的影响。关于小规模,我们建议通过实施一个额外的嵌入式神经网络来避免相邻体之间的冲突,以便为具有更高行动优先事项的参与者提供更多的机会。此外,我们提出了一种联邦培训方法,以便有效地优化每个剂的神经网络,在两种不同尺度上。我们采用大规模采用一种间接模式,以自动地展示一种模拟方式展示我们每个代理人的模样图象化方式。最后,我们建议了一个数字的模化了一种模范。