Reinforcement learning (RL) algorithms have been around for decades and been employed to solve various sequential decision-making problems. These algorithms however have faced great challenges when dealing with high-dimensional environments. The recent development of deep learning has enabled RL methods to drive optimal policies for sophisticated and capable agents, which can perform efficiently in these challenging environments. This paper addresses an important aspect of deep RL related to situations that demand multiple agents to communicate and cooperate to solve complex tasks. A survey of different approaches to problems related to multi-agent deep RL (MADRL) is presented, including non-stationarity, partial observability, continuous state and action spaces, multi-agent training schemes, multi-agent transfer learning. The merits and demerits of the reviewed methods will be analyzed and discussed, with their corresponding applications explored. It is envisaged that this review provides insights about various MADRL methods and can lead to future development of more robust and highly useful multi-agent learning methods for solving real-world problems.
翻译:数十年来,强化学习(RL)算法一直用于解决各种连续决策问题,然而,这些算法在处理高维环境时面临巨大挑战。最近深层次学习的发展使RL方法能够推动为精密和有能力的代理商制定最佳政策,这些代理商能够在这些具有挑战性的环境中高效地发挥作用。本文件论述深度学习(RL)算法的一个重要方面,涉及要求多个代理商进行沟通与合作以解决复杂任务的情况。对多试剂深RL(MADL)问题的不同处理方法进行了调查,包括非常态、部分易用性、连续状态和行动空间、多剂培训计划、多剂转让学习。将分析和讨论所审查方法的优点和优点,并探讨其相应的应用。设想这次审查将提供关于多种MADRL方法的见解,并导致今后开发更有力和非常有用的多剂学习方法,以解决现实世界问题。