The research of extending deep reinforcement learning (drl) to multi-agent field has solved many complicated problems and made great achievements. However, almost all these studies only focus on discrete or continuous action space and there are few works having ever used multi-agent deep reinforcement learning to real-world environment problems which mostly have a hybrid action space. Therefore, in this paper, we propose two algorithms: deep multi-agent hybrid soft actor-critic (MAHSAC) and multi-agent hybrid deep deterministic policy gradients (MAHDDPG) to fill this gap. This two algorithms follow the centralized training and decentralized execution (CTDE) paradigm and could handle hybrid action space problems. Our experiences are running on multi-agent particle environment which is an easy multi-agent particle world, along with some basic simulated physics. The experimental results show that these algorithms have good performances.
翻译:将深度强化学习(drl)的研究扩大到多试剂领域,解决了许多复杂的问题并取得了巨大成就,然而,几乎所有这些研究都只侧重于离散或连续的行动空间,而且很少有工作曾将多剂深度强化学习用于现实世界环境问题,而现实世界环境问题大多具有混合行动空间。因此,我们在本文件中提出了两种算法:深多剂混合软体行为者-critic(MAHSAC)和多剂混合深海确定性政策梯度(MAHDDPG),以填补这一空白。这两种算法遵循集中培训和分散执行模式(CTDE),可以处理混合行动空间问题。我们的经验是在多剂粒子环境中运行的,这是一个容易的多剂粒子世界,以及一些基本的模拟物理学。实验结果表明,这些算法具有良好的性能。