Individuality is essential in human society, which induces the division of labor and thus improves the efficiency and productivity. Similarly, it should also be the key to multi-agent cooperation. Inspired by that individuality is of being an individual separate from others, we propose a simple yet efficient method for the emergence of individuality (EOI) in multi-agent reinforcement learning (MARL). EOI learns a probabilistic classifier that predicts a probability distribution over agents given their observation and gives each agent an intrinsic reward of being correctly predicted by the classifier. The intrinsic reward encourages the agents to visit their own familiar observations, and learning the classifier by such observations makes the intrinsic reward signals stronger and the agents more identifiable. To further enhance the intrinsic reward and promote the emergence of individuality, two regularizers are proposed to increase the discriminability of the classifier. We implement EOI on top of popular MARL algorithms. Empirically, we show that EOI significantly outperforms existing methods in a variety of multi-agent cooperative scenarios.
翻译:个体性在人类社会中至关重要,这促使劳动分工,从而提高效率和生产力。同样,它也应该是多剂合作的关键。受这种个体性启发,个人性与他人分离,我们提出一种简单而有效的方法,在多剂强化学习中出现个体性(EOI),我们建议一种简单而有效的方法,在多剂强化学习(MARL)中出现个体性(EOI ) 。EOI 学会一种概率分类方法,根据代理人的观察预测,预测代理人的概率分布,并给每个代理人一个内在的奖赏,证明他们被分类者正确预测。内在奖赏鼓励代理人访问他们自己熟悉的观察,通过这种观察学习分类者使内在奖赏信号更加强大,使代理人更容易识别。为了进一步加强内在奖赏,促进个性的出现,我们建议两个正规化者来提高分类者的不相容性。我们在流行的MARL算法上实施EOI。我们很生动地表明,EOI在多种多剂合作情景中明显超越了现有方法。