在利用知识转让的动态环境中加强学习 (Multi-agent Reinforcement Learning Improvement in a Dynamic Environment Using Knowledge Transfer)

Cooperative multi-agent systems are being widely used in variety of areas. Interaction between agents would bring positive points, including reducing costs of operating, high scalability, and facilitating parallel processing. These systems pave the way for handling large-scale, unknown, and dynamic environments. However, learning in these environments has become a prominent challenge in different applications. These challenges include the effect of size of search space on learning time, inappropriate cooperation among agents, and the lack of proper coordination among agents' decisions. Moreover, reinforcement learning algorithms may suffer from long time of convergence in these problems. In this paper, a communication framework using knowledge transfer concepts is introduced to address such challenges in the herding problem with large state space. To handle the problems of convergence, knowledge transfer has been utilized that can significantly increase the efficiency of reinforcement learning algorithms. Coordination between the agents is carried out through a head agent in each group of agents and a coordinator agent respectively. The results demonstrate that this framework could indeed enhance the speed of learning and reduce convergence time.

翻译：多试剂合作系统正在各个领域广泛使用。代理商之间的互动将带来积极点,包括降低运营成本、高可缩放性和便利平行处理。这些系统为处理大规模、未知和动态环境铺平了道路。然而,在这些环境中的学习已成为不同应用中的一个突出挑战。这些挑战包括搜索空间的大小对学习时间的影响、代理商之间不适当的合作以及代理商决策之间缺乏适当协调。此外,强化学习算法可能因这些问题的长期趋同而受到影响。在本文件中,引入了使用知识转移概念的通信框架,以应对在大面积的州间放牧问题中遇到的此类挑战。为了处理趋同问题,已经利用了知识转让,从而大大提高了强化学习算法的效率。各代理商之间的协调分别通过每组代理商的领导代理商和协调员代理商进行。结果表明,这一框架确实可以提高学习速度并缩短聚合时间。