In this work, we consider the problem of computing optimal actions for Reinforcement Learning (RL) agents in a co-operative setting, where the objective is to optimize a common goal. However, in many real-life applications, in addition to optimizing the goal, the agents are required to satisfy certain constraints specified on their actions. Under this setting, the objective of the agents is to not only learn the actions that optimize the common objective but also meet the specified constraints. In recent times, the Actor-Critic algorithm with an attention mechanism has been successfully applied to obtain optimal actions for RL agents in multi-agent environments. In this work, we extend this algorithm to the constrained multi-agent RL setting. The idea here is that optimizing the common goal and satisfying the constraints may require different modes of attention. By incorporating different attention modes, the agents can select useful information required for optimizing the objective and satisfying the constraints separately, thereby yielding better actions. Through experiments on benchmark multi-agent environments, we show the effectiveness of our proposed algorithm.
翻译:在这项工作中,我们考虑了在合作环境中计算加强学习剂的最佳行动的问题,目标是优化一个共同目标。然而,在许多实际应用中,除了优化目标之外,还要求代理人满足对其行动规定的某些限制。在这一背景下,代理人的目标不仅是学习优化共同目标的行动,而且还要满足特定限制。最近,在多试剂环境中成功地应用了关注机制的行为者-批评算法,以便为RL代理人获得最佳行动。在这项工作中,我们将这一算法推广到受限制的多试剂RL设置。这里的想法是,优化共同目标和满足限制可能需要不同的关注模式。通过采用不同的关注模式,代理人可以选择优化目标所需的有用信息,单独满足限制,从而产生更好的行动。通过对多试剂环境中的基准环境的实验,我们展示了我们提议的算法的有效性。