优化控制强健的团队随机游戏游戏 (Optimal control of robust team stochastic games)

In stochastic dynamic environments, team stochastic games have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to the model parameters, which are typically unknown and required to be estimated from noisy data in practice. To mitigate the sensitivity of the optimal policy to these uncertain parameters, in this paper, we propose a model of "robust" team stochastic games, where players utilize a robust optimization approach to make decisions. This model extends team stochastic games to the scenario of incomplete information and meanwhile provides an alternative solution concept of robust team optimality. To seek such a solution, we develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate the curse of dimensionality. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by generalizing the game model of social dilemmas to sequential robust scenarios.

翻译：在随机动态环境中,团队随机游戏已成为研究全面合作的多试剂系统的一系列决策问题的多功能范例。然而,衍生政策的最佳性通常对模型参数十分敏感,这些参数通常不为人知,而且需要从实践中的繁杂数据中估算。为了减轻最佳政策对这些不确定参数的敏感性,我们在本文件中提出了一个“机器人”团队随机游戏模式,其中玩家利用强力优化方法作出决定。这一模式将团队随机游戏扩展至不完整信息的设想,同时提供了强力团队优化的替代解决方案概念。为了寻求这样的解决方案,我们开发了一种学习算法,其形式是高斯-Seidel修改的政策迭代法,并证明其趋同性。这一算法与强劲的动态编程相比,不仅具有更快的趋同率,而且还允许使用近似计算来减轻维度的诅咒。此外,还提出了一些数字模拟,以通过将游戏模式的一般社会困境模式与连续的稳健设想法来证明算法的有效性。