解决基于图形的公共公益运动,同时进行树搜索和模拟学习 (Solving Graph-based Public Good Games with Tree Search and Imitation Learning)

Public goods games represent insightful settings for studying incentives for individual agents to make contributions that, while costly for each of them, benefit the wider society. In this work, we adopt the perspective of a central planner with a global view of a network of self-interested agents and the goal of maximizing some desired property in the context of a best-shot public goods game. Existing algorithms for this known NP-complete problem find solutions that are sub-optimal and cannot optimize for criteria other than social welfare. In order to efficiently solve public goods games, our proposed method directly exploits the correspondence between equilibria and the Maximal Independent Set (mIS) structural property of graphs. In particular, we define a Markov Decision Process, which incrementally generates an mIS, and adopt a planning method to search for equilibria, outperforming existing methods. Furthermore, we devise an imitation learning technique that uses demonstrations of the search to obtain a graph neural network parametrized policy which quickly generalizes to unseen game instances. Our evaluation results show that this policy is able to reach 99.5% of the performance of the planning method while being approximately three orders of magnitude faster to evaluate on the largest graphs tested. The methods presented in this work can be applied to a large class of public goods games of potentially high societal impact.

翻译：公共产品游戏代表着深思熟虑地研究激励个体代理商做出贡献的激励机制,这些激励机制虽然对每个代理商来说代价昂贵,但却有利于更广泛的社会。在这项工作中,我们采纳了一个中心规划师的观点,该规划师的视角是,从全球的角度看待一个自我感兴趣的代理商网络,并着眼于在最佳的公益游戏中最大限度地扩大某些理想财产。这个已知的NP彻底问题的现有算法找到的解决方案不尽人意,无法优化社会福利以外的标准。为了有效解决公益游戏,我们提出的方法直接利用了公平与最大独立游戏结构属性之间的对应。特别是,我们定义了马尔科夫决策程序,该程序逐步生成了一个MSI,并采用了一种规划方法,以寻找某种理想的属性,超越了现有方法。此外,我们设计了一种模拟的学习技术,利用搜索的演示来获得一个图形神经网络的分化政策,迅速概括到看不见的游戏实例。我们的评估结果表明,这一政策能够达到规划方法绩效的99.5%,而这一方法的绩效在公众游戏中应用得分量最高,同时,可以快速地检验这一社会价值。