We consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness. In experiments, we demonstrate the importance of the two sub-networks for fair optimization. Our overall approach is general as it can accommodate any (sub)differentiable welfare function. Therefore, it is compatible with various notions of fairness that have been proposed in the literature (e.g., lexicographic maximin, generalized Gini social welfare function, proportional fairness). Our solution method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. Finally, we experimentally validate our approach in various domains and show that it can perform much better than previous methods.
翻译:我们认为(深入的)多剂合作强化学习(MARL)中学习公平政策的问题。我们以原则性的方式把它正式化为优化福利功能的问题,该功能明确包含公平的两个重要方面:效率和公平。作为一种解决办法,我们提议一个新的神经网络结构,由两个专门为考虑公平的两个方面而设计的子网络组成。在实验中,我们展示了两个子网络对于公平优化的重要性。我们的总体方法很笼统,因为它可以容纳任何(相对的)差别化福利功能。因此,它符合文献中所提出的各种公平概念(例如,词典、通用的吉尼社会福利功能、比例公平)。我们的解决办法是通用的,可以在不同MARL环境中实施:集中培训和分散执行,或者完全分散执行。最后,我们实验性地验证了我们在各个领域的方法,并表明它能够比以前的方法更好。