This paper introduces Team-Attention-Actor-Critic (TAAC), a reinforcement learning algorithm designed to enhance multi-agent collaboration in cooperative environments. TAAC employs a Centralized Training/Centralized Execution scheme incorporating multi-headed attention mechanisms in both the actor and critic. This design facilitates dynamic, inter-agent communication, allowing agents to explicitly query teammates, thereby efficiently managing the exponential growth of joint-action spaces while ensuring a high degree of collaboration. We further introduce a penalized loss function which promotes diverse yet complementary roles among agents. We evaluate TAAC in a simulated soccer environment against benchmark algorithms representing other multi-agent paradigms, including Proximal Policy Optimization and Multi-Agent Actor-Attention-Critic. We find that TAAC exhibits superior performance and enhanced collaborative behaviors across a variety of metrics (win rates, goal differentials, Elo ratings, inter-agent connectivity, balanced spatial distributions, and frequent tactical interactions such as ball possession swaps).


翻译:本文提出Team-Attention-Actor-Critic(TAAC),一种旨在增强合作环境中多智能体协作的强化学习算法。TAAC采用集中训练/集中执行框架,在演员网络与评论家网络中均融入多头注意力机制。该设计实现了智能体间的动态通信,使智能体能够显式查询队友状态,从而在确保高度协作的同时有效应对联合动作空间的指数级增长。我们进一步引入一种带惩罚项的损失函数,以促进智能体间形成多样且互补的角色分工。我们在模拟足球环境中将TAAC与代表其他多智能体范式的基准算法(包括近端策略优化与多智能体演员-注意力-评论家)进行对比评估。实验结果表明,TAAC在多项指标(胜率、净胜球数、Elo评分、智能体间连接度、均衡的空间分布以及频繁的战术交互如控球权交换)上均表现出更优的性能与更强的协作行为。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员