We study reserve price optimization in multi-phase second price auctions, where seller's prior actions affect the bidders' later valuations through a Markov Decision Process (MDP). Compared to the bandit setting in existing works, the setting in ours involves three challenges. First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders who aim to manipulates seller's policy. Second, we want to minimize the seller's revenue regret when the market noise distribution is unknown. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment. We propose a mechanism addressing all three challenges. To address the first challenge, we use a combination of a new technique named "buffer periods" and inspirations from Reinforcement Learning (RL) with low switching cost to limit bidders' surplus from untruthful bidding, thereby incentivizing approximately truthful bidding. The second one is tackled by a novel algorithm that removes the need for pure exploration when the market noise distribution is unknown. The third challenge is resolved by an extension of LSVI-UCB, where we use the auction's underlying structure to control the uncertainty of the revenue function. The three techniques culminate in the $\underline{\rm C}$ontextual-$\underline{\rm L}$SVI-$\underline{\rm U}$CB-$\underline{\rm B}$uffer (CLUB) algorithm which achieves $\tilde{ \mathcal{O}}(H^{5/2}\sqrt{K})$ revenue regret when the market noise is known and $\tilde{ \mathcal{O}}(H^{3}\sqrt{K})$ revenue regret when the noise is unknown with no assumptions on bidders' truthfulness.
翻译:在多阶段第二价拍卖中,我们研究价格优化多阶段{{{{{卖方先前的行动通过Markov决定程序(MDP)影响投标人后来的估值。与现有工程中的土匪环境相比,我们的环境涉及三个挑战。首先,从卖方的角度来看,我们需要在潜在不真实的投标人面前有效地探索环境,他们的目的是操纵卖方的政策。第二,当市场噪音分布不明时,我们要尽量减少卖方的收入遗憾。第三,卖方的每步收入是未知的,非线性,甚至无法从环境中直接观察。我们建议了一个机制,解决所有三个挑战。为了应对第一个挑战,我们使用名为“缓冲期”的新技术和来自强化学习(RLLL)的灵感组合,以低转折成本将投标人的盈余限制在不真实的投标中,从而激励了近乎真实的投标。第二,用新的算法解决了在市场噪音分布不明时需要纯粹的美元。第三个挑战通过LS=C=CL_CL_CR=O的不确定性来解决。