It can largely benefit the reinforcement learning process of each agent if multiple agents perform their separate reinforcement learning tasks cooperatively. These tasks can be not exactly the same but still benefit from the communication behaviour between agents due to task similarities. In fact, this learning scenario is not well understood yet and not well formulated. As the first effort, we provide a detailed discussion of this scenario, and propose group-agent reinforcement learning as a formulation of the reinforcement learning problem under this scenario and a third type of reinforcement learning problem with respect to single-agent and multi-agent reinforcement learning. We propose that it can be solved with the help of modern deep reinforcement learning techniques and provide a distributed deep reinforcement learning algorithm called DDA3C (Decentralised Distributed Asynchronous Advantage Actor-Critic) that is the first framework designed for group-agent reinforcement learning. We show through experiments in the CartPole-v0 game environment that DDA3C achieved desirable performance and has good scalability.
翻译:如果多个代理商以合作方式执行各自的强化学习任务,这在很大程度上可以使每个代理商的强化学习过程受益。这些任务可能并不完全相同,但是仍然从代理商之间的沟通行为中受益。事实上,这一学习设想方案尚未很好地理解,而且没有很好地制定。作为第一项努力,我们详细讨论这一设想方案,并提议集体代理强化学习,作为这一设想方案下强化学习问题的提法,以及在单一代理商和多代理商强化学习方面第三种强化学习问题。我们提议,在现代深层强化学习技术的帮助下,可以解决这个问题,并提供一种分布式的深度强化学习算法,称为DDA3C(分散分布式分散式的Asynchron Asjon advantage Acor-Critic),这是为集体代理商强化学习设计的第一个框架。我们通过在CartPole-V0游戏环境中的实验表明,DDADC取得了理想的业绩,而且具有良好的可缩放性。