In stochastic dynamic environments, team Markov games have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To mitigate the sensitivity of optimal policies to these uncertain parameters, we propose a robust model of team Markov games in this paper, where agents utilize robust optimization approaches to update strategies. This model extends team Markov games to the scenario of incomplete information and meanwhile provides an alternative solution concept of robust team optimality. To seek such a solution, we develop a robust iterative learning algorithm of team policies and prove its convergence. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate the curse of dimensionality. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by generalizing the game model of sequential social dilemmas to uncertain scenarios.
翻译:在随机动态环境中,马可夫球队已成为研究充分合作的多试剂系统的一系列决策问题的多功能范例,但是,衍生政策的最佳性通常对模型参数十分敏感,这些参数通常不为人知,而且需要从实际的繁忙数据中估算。为了减轻最佳政策对这些不确定参数的敏感性,我们在本文件中提出了一个强大的马可夫球队游戏模型,其中代理商利用强力优化方法更新战略。这个模型将马可夫球队游戏扩展至不完整信息情景,同时提供了强力团队优化的替代解决方案概念。为了寻求这样的解决方案,我们开发了一支强有力的团队政策迭代学习算法,并证明了其趋同性。这一算法与强有力的动态程序相比,不仅具有更快的趋同率,而且还允许使用近似计算来减轻维度的诅咒。此外,还提出了一些数字模拟,通过将连续的社会两难的游戏模型与不确定的情景进行概括,来证明算法的有效性。