When an individual's behavior has rational characteristics, this may lead to irrational collective actions for the group. A wide range of organisms from animals to humans often evolve the social attribute of cooperation to meet this challenge. Therefore, cooperation among individuals is of great significance for allowing social organisms to adapt to changes in the natural environment. Based on multi-agent reinforcement learning, we propose a new learning strategy for achieving coordination by incorporating a learning rate that can balance exploration and exploitation. We demonstrate that agents that use the simple strategy improve a relatively collective return in a decision task called the intertemporal social dilemma, where the conflict between the individual and the group is particularly sharp. We also explore the effects of the diversity of learning rates on the population of reinforcement learning agents and show that agents trained in heterogeneous populations develop particularly coordinated policies relative to those trained in homogeneous populations.
翻译:当一个人的行为具有理性特点时,这可能导致该群体采取非理性的集体行动。从动物到人类的多种生物往往演变出合作应对这一挑战的社会属性。因此,个人之间的合作对于使社会有机体适应自然环境的变化具有重大意义。根据多试剂强化学习,我们提出一种新的学习战略,通过纳入能够平衡探索和剥削的学习率实现协调。我们证明,采用简单战略的代理人在被称为时际社会困境的决策任务中改善了相对集体的回报,即个人与群体之间的冲突特别尖锐。我们还探讨了学习率的多样性对强化学习主体人口的影响,并表明,接受过不同人群培训的代理人制定了与那些受过同质人口培训的人员特别协调一致的政策。