We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.
翻译:我们对间接机制采用强化学习方法,与现有的一系列顺序价格机制合作,将连续独裁和贴现价格机制统统化,并基本上概括所有明显明显具有战略防守机制的特点;在这一类中学习一种最佳机制形成了一个部分可观察的马尔科夫决策程序;我们为这类机制何时比简单的静态机制更强大提供了严格的条件,为学习提供了足够或不足的观察统计数据,并为复杂的(深度)政策提供了必要条件;我们表明,我们的方法可以在几个实验环境中学习最佳或接近最佳的机制。