We consider a dynamic pricing problem for repeated contextual second-price auctions with multiple strategic buyers who aim to maximize their long-term time discounted utility. The seller has limited information on buyers' overall demand curves which depends on a non-parametric market-noise distribution, and buyers may potentially submit corrupted bids (relative to true valuations) to manipulate the seller's pricing policy for more favorable reserve prices in the future. We focus on designing the seller's learning policy to set contextual reserve prices where the seller's goal is to minimize regret compared to the revenue of a benchmark clairvoyant policy that has full information of buyers' demand. We propose a policy with a phased-structure that incorporates randomized "isolation" periods, during which a buyer is randomly chosen to solely participate in the auction. We show that this design allows the seller to control the number of periods in which buyers significantly corrupt their bids. We then prove that our policy enjoys a $T$-period regret of $\widetilde{\mathcal{O}}(\sqrt{T})$ facing strategic buyers. Finally, we conduct numerical simulations to compare our proposed algorithm to standard pricing policies. Our numerical results show that our algorithm outperforms these policies under various buyer bidding behavior.
翻译:我们考虑与多个战略买主反复进行背景二次价格拍卖的动态定价问题,这些买主的目标是最大限度地扩大其长期的折扣效用。卖方对买主的总体需求曲线了解有限,而买主的总体需求曲线取决于非参数性市场-市场噪音分配,买主可能会提交腐败的出价(相对于真实估值而言),以操纵卖方的定价政策,争取更有利的未来储备价格。我们侧重于设计卖方的学习政策,以设定背景储备价格,如果卖主的目标是最大限度地减少对基准的CIirvoyant政策收入的遗憾,而该政策完全了解买主的需求。我们提出了一个分阶段结构的政策,其中包含随机的“隔离”期,在此期间,买主被随机选择只参与拍卖。我们表明,这种设计允许卖主控制买主大幅腐蚀其报价的时期数。我们然后证明,我们的政策在时间里里对 $\ ltilde{O\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\