配有强盗反馈的机制设计 (Mechanism Design with Bandit Feedback)

We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values. On each round, a mechanism assigns an allocation each to a set of agents and charges them a price; then the agents provide (stochastic) feedback to the mechanism for the allocation they received. This is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. Therefore, the mechanism needs to explore different allocations for each agent, while simultaneously attempting to find the socially optimal set of allocations. Our focus is on truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism. We show that these three terms are interdependent via an $\Omega(T^{\frac{2}{3}})$ lower bound for the maximum of these three terms after $T$ rounds of allocations, and describe a family of anytime algorithms which achieve this rate. Our framework provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets, and additionally to control the degree of truthfulness and individual rationality.

翻译：在代理商不知道其价值的情况下,我们研究一个多方面福利最大化机制的设计问题。在每一回合中,一个机制对一组代理商各分配一笔款项,并收取一个价格;然后代理商为收到的分配机制提供(随机的)反馈。这是在云市场和在线广告中应用的动机,代理商只有在经历了云市场和在线广告之后才能知道其价值,因此,该机制需要探索对每个代理商的不同分配,同时试图找到社会最佳的分配组合。我们的重点是长期仿照典型的VCG机制的诚实和个别合理机制。为此,我们界定了三种对福利、每个代理商的个别公用事业和机制的效用表示遗憾的概念。我们表明,这三个术语通过美元(Täfrac{2 ⁇ 3 ⁇ %%%%%%%%%%%%%%%%%%%%%%%%%%%的汇率,在分配回合后才能知道其价值最高值。因此,该机制需要探索对每个代理商的不同分配范围,同时试图找到社会最佳的分配比例。我们的重点是从长远地控制价格计划的灵活性。我们的框架提供了灵活性,以便控制代理人与卖方之间的交易和真实程度。