We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them. The agents' valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents' utility compared to that under the optimal offline allocation policy.
翻译:我们研究在单一物剂之间按顺序分配美元到达物品的问题,限制每个物剂必须获得所有物品的预定分数,目的是尽量扩大代理人对分配给它们的项目的总价值。每轮物品的代理人估价假定是i.d.,但中央计划者事先不知道其分配情况。因此,中央计划者需要从观察到的价值中隐含地了解这些分配情况,以便选择一个良好的分配政策。然而,这里的另一个挑战是,代理人具有战略动机,为了获得更好的分配而错误报告其价值。这使我们的工作与通常假定为已知的估价分配和/或涉及付款的在线拍卖设计环境以及不考虑战略剂的在线学习环境不同。为此,我们的主要贡献是一个网上学习分配机制,它大约与巴伊斯人的奖励相兼容,而且如果所有代理人都是真实的,则保证个人代理人的效用比最佳离线分配政策下的次级遗憾。