通过利用共同信号来加强合作 (Improved Cooperation by Exploiting a Common Signal)

Can artificial agents benefit from human conventions? Human societies manage to successfully self-organize and resolve the tragedy of the commons in common-pool resources, in spite of the bleak prediction of non-cooperative game theory. On top of that, real-world problems are inherently large-scale and of low observability. One key concept that facilitates human coordination in such settings is the use of conventions. Inspired by human behavior, we investigate the learning dynamics and emergence of temporal conventions, focusing on common-pool resources. Extra emphasis was given in designing a realistic evaluation setting: (a) environment dynamics are modeled on real-world fisheries, (b) we assume decentralized learning, where agents can observe only their own history, and (c) we run large-scale simulations (up to 64 agents). Uncoupled policies and low observability make cooperation hard to achieve; as the number of agents grow, the probability of taking a correct gradient direction decreases exponentially. By introducing an arbitrary common signal (e.g., date, time, or any periodic set of numbers) as a means to couple the learning process, we show that temporal conventions can emerge and agents reach sustainable harvesting strategies. The introduction of the signal consistently improves the social welfare (by 258% on average, up to 3306%), the range of environmental parameters where sustainability can be achieved (by 46% on average, up to 300%), and the convergence speed in low abundance settings (by 13% on average, up to 53%).

翻译：人造代理人能否从人类公约中受益?人类社会设法成功地自我组织和解决共同资源资源中公域公域悲剧的悲剧,尽管对不合作的游戏理论的预测暗淡,人类社会还是设法成功地组织起来,解决共同资源资源中公域公域的悲剧。除了此以外,现实世界的问题本质上是大规模和低可观测性。促进人类在这种环境中协调的一个关键概念是利用公约。受人类行为的影响,我们调查学习动态和时间公约的出现,侧重于共同资源。在设计一个现实的评价设置时,我们给予了额外的强调:(a)环境动态以现实世界渔业为模范,(b)我们承担分散学习,使代理人只能观察自己的历史,以及(c)我们进行大规模模拟(高达64个代理人 ) 。在这种情况下,不协调的政策和不易观察性使合作难以实现;随着代理人数量的增加,采取正确的梯度方向的可能性会急剧下降。通过武断的共同信号(例如,日期,时间,时间,或任何定期的数字组合)作为学习进程的一个手段,我们展示的是,时间性公约可以出现和代理人在300项上只观察它们自己的历史,只有46个参数,而只观察它们可以观察它们的历史,而只能观察它们本身的历史,以及(多达64个代理人)我们进行大规模的模拟模拟模拟,我们进行大规模的模拟模拟模拟模拟模拟模拟模拟模拟。使53个平均地显示到平均的周期性地测测算。