通过信息设计实现马尔科夫运动会的平衡 (On the Equilibrium Elicitation of Markov Games Through Information Design)

This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents' strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (designer) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the designer, such that the design process avoids the predictions of the agents' strategic selection behaviors. We then introduce the design protocol given a goal of the designer referred to as obedient implementability (OIL) and characterize the OIL in a class of obedient perfect Bayesian Markov Nash equilibria (O-PBME). A new framework for information design is proposed based on an approach of maximizing the optimal slack variables. Finally, we formulate the designer's goal selection problem and characterize it in terms of information design by establishing a relationship between the O-PBME and the Bayesian Markov correlated equilibria, in which we build upon the revelation principle in classic information design in economics. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.

翻译：这项工作考虑到一个新的信息设计问题和研究,与支付相关的环境信号的手法完全能够影响智能剂的行为。代理商的战略互动被一个不完全的信息信息马可夫游戏所捕捉,在这个游戏中,每个代理商首先从多个信号源中选择一个环境信号作为额外的与支付相关的信息,然后采取行动。有一个理性的信息设计师(设计师)拥有一个信号源,目的是通过设计向代理商发送的信号的信息结构来控制代理商的平衡行为。确立了服从原则,该原则指出,当信息设计激励每个代理商选择设计者发送的信号时,将直接信息设计作为一般性的重点,这样设计过程就避免了对代理商战略选择行为的预测。然后,我们引入设计师的一个设计协议,这个设计师的目标被称作可服从性执行性(OIL),目的是通过设计一个符合要求的精准性Bayesian Markov Nash equiliblibrial(O-PBME) 原则。一个新的信息设计框架是基于一种最大限度的利差性行为方法,我们从设计中将一个最差的精确的汇率选择规则,我们从设计中将一个设计中将设计中选择的精确关系作为设计规则定义定义定义定义定义定义定义定义定义,我们将设计中将设计中,将设计作为设计中的一种,将设计师在确定一个设计中,将设计中,将设计师在OBIFIFIFI在设计中,在设计中,在设计中将设计中将一个设计中,将一个设计中,将一个在设计中,将一个在确定性选择性选择性选择性原则中,将设计中,在确定性选择性选择性选择性原则中,将设计中,将设计中,将设计中,将设计中,将一个设计中,将一个设计师的目标作为在设计中,将一个定义性选择性选择性选择性选择性选择性选择性选择性定义性定义性定义性选择性选择性选择性选择性选择性选择性原则中,将一个设计上的一种,将一个定义性定义性定义性原则作为一个定义性选择性原则中,将一个定义性原则中,将设计中,将一个设计上性选择性原则作为一个定义性原则作为一个定义性原则作为一个定义性定义性定义性定义性原则作为一种