We study Bayesian automated mechanism design in unstructured dynamic environments, where a principal repeatedly interacts with an agent, and takes actions based on the strategic agent's report of the current state of the world. Both the principal and the agent can have arbitrary and potentially different valuations for the actions taken, possibly also depending on the actual state of the world. Moreover, at any time, the state of the world may evolve arbitrarily depending on the action taken by the principal. The goal is to compute an optimal mechanism which maximizes the principal's utility in the face of the self-interested strategic agent. We give an efficient algorithm for computing optimal mechanisms, with or without payments, under different individual-rationality constraints, when the time horizon is constant. Our algorithm is based on a sophisticated linear program formulation, which can be customized in various ways to accommodate richer constraints. For environments with large time horizons, we show that the principal's optimal utility is hard to approximate within a certain constant factor, complementing our algorithmic result. We further consider a special case of the problem where the agent is myopic, and give a refined efficient algorithm whose time complexity scales linearly in the time horizon. Moreover, we show that memoryless mechanisms do not provide a good solution for our problem, in terms of both optimality and computational tractability. These results paint a relatively complete picture for automated dynamic mechanism design in unstructured environments. Finally, we present experimental results where our algorithms are applied to synthetic dynamic environments with different characteristics, which not only serve as a proof of concept for our algorithms, but also exhibit intriguing phenomena in dynamic mechanism design.
翻译:我们研究贝叶斯自动机制设计在结构化的动态环境中,主要人物与代理人反复互动,并根据战略代理人关于世界目前状况的报告采取行动。当时间期限不变时,主要人物和代理人可以任意和可能不同地对所采取的行动进行估价,可能还取决于世界的实际情况。此外,在任何时候都,世界状况可能任意演变,取决于校长采取的行动。目标是计算一个最佳机制,在自我感兴趣的战略代理人面前,使校长的效用最大化。我们根据战略代理人关于世界目前状况的报告,在不同的个人理性限制下,为计算最佳机制提供有效的算法。当时间期限不变时,主要人物和代理人可以任意和可能不同地对所采取的行动进行不同的估价。此外,对于具有大时空前景的环境,我们表明本位的最佳效用很难在某种固定因素中进行估计,补充我们的算法结果。我们进一步考虑一个特殊的现象,即代理人是近似的,而是在不同的个人理性概念限制下,我们算法的精确的算法是时间复杂度,在时间结构中,我们用一个精确的算法来提供我们目前动态的精确的计算结果,最后的计算方法,我们用一个我们平时空的逻辑的逻辑的计算方法来显示我们最精确的精确的算。