商店第一价格竞标速速率学习 (Fast Rate Learning in Stochastic First Price Bidding)

First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. As far as learning is concerned, first-price auctions are more challenging because the optimal bidding strategy does not only depend on the value of the item but also requires some knowledge of the other bids. They have already given rise to several works in sequential learning, many of which consider models for which the value of the buyer or the opponents' maximal bid is chosen in an adversarial manner. Even in the simplest settings, this gives rise to algorithms whose regret grows as $\sqrt{T}$ with respect to the time horizon $T$. Focusing on the case where the buyer plays against a stationary stochastic environment, we show how to achieve significantly lower regret: when the opponents' maximal bid distribution is known we provide an algorithm whose regret can be as low as $\log^2(T)$; in the case where the distribution must be learnt sequentially, a generalization of this algorithm can achieve $T^{1/3+ \epsilon}$ regret, for any $\epsilon>0$. To obtain these results, we introduce two novel ideas that can be of interest in their own right. First, by transposing results obtained in the posted price setting, we provide conditions under which the first-price biding utility is locally quadratic around its optimum. Second, we leverage the observation that, on small sub-intervals, the concentration of the variations of the empirical distribution function may be controlled more accurately than by using the classical Dvoretzky-Kiefer-Wolfowitz inequality. Numerical simulations confirm that our algorithms converge much faster than alternatives proposed in the literature for various bid distributions, including for bids collected on an actual programmatic advertising platform.

翻译：第一价拍卖在很大程度上取代了基于Vickrey拍卖的传统招标方法。就学习而言,第一价拍卖更具有挑战性,因为最佳投标战略不仅取决于项目的价值,而且需要对其他投标的某些知识。它们已经在连续学习中产生了若干作品,其中许多考虑到买方或对手最大出价的价值是以对立方式选择的模型。即使在最简单的情况下,这也产生了一些算法,这些算法在时间范围上以美元计价,而对于时间范围值而言则以美元计价。侧重于买方与固定的随机环境打交道的案例中,我们展示了如何大幅降低遗憾:当反对者最大出价分布已知我们提供了一种算法,其遗憾可能低到美元(2美元) (T) 或对手最高出价。在必须按次顺序来学习的情况下,这种算法的概括化可以达到美元(1/3+\ epsilon美元) 的变价。对于任何美元的内部观察值的变价值的变价值值值值, 可能比我们最初的变价值值值要高出。我们先用美元计算出一个新的价值。