We consider a multi-agent Markov strategic interaction over an infinite horizon where agents can be of multiple types. We model the strategic interaction as a mean-field game in the asymptotic limit when the number of agents of each type becomes infinite. Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent. Each agent wants to maximize the discounted sum of rewards over the infinite horizon which depends on the state of the agent and the distribution of the state of the leaders and followers. We seek to characterize and compute a stationary multi-type Mean field equilibrium (MMFE) in the above game. We characterize the conditions under which a stationary MMFE exists. Finally, we propose Reinforcement learning (RL) based algorithm using policy gradient approach to find the stationary MMFE when the agents are unaware of the dynamics. We, numerically, evaluate how such kind of interaction can model the cyber attacks among defenders and adversaries, and show how RL based algorithm can converge to an equilibrium.
翻译:我们考虑的是多剂Markov 战略互动的无限地平线, 其代理商可以是多种类型的。 当每种类型的代理商的数量变得无限时, 我们将战略互动作为无药可依的场外游戏模拟。 每个代理商都有一个私人状态; 国家的变化取决于不同类型代理商的分布情况和代理商的行动。 每个代理商都希望在无限的地平线上最大限度地获得折扣的奖励总和, 这取决于代理商的状况以及领导者和追随者的状况。 我们试图在上述游戏中描述和计算固定的多类型中位场平衡( MMFE ) 。 我们描述一个固定的 MMFE 存在的条件。 最后, 我们建议使用基于政策梯度的算法, 在代理商对动态不知情时找到固定的 MMFE 。 我们从数字上评价这种互动如何模拟维权者和对手之间的网络攻击, 并显示基于RL 的算法如何趋同。