Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser's value and also competing advertisers' values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: how many historical samples are required to learn a good expenditure plan? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of $T\log T$ samples per distribution to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that just one sample per distribution is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.
翻译:主要的互联网广告平台提供预算间隔工具,作为广告商管理广告活动的标准服务。 鉴于广告商的价值以及长期以来相互竞争的广告商的价值固有的非常态性,通常使用的方法是学习一个目标支出计划,规定一个目标支出时间函数,然后运行一个跟踪这一计划的控制器。 这就提出了这样一个问题:需要多少历史样本来学习一个良好的支出计划? 我们研究这一问题,方法是考虑一名广告商反复参与二价$T的拍卖,其价值和最高竞价的标价都来自未知的时间变化分布。 广告商试图在预算限制下最大限度地实现她的全部效用。 先前的工作显示,每个分配的美元T$\log T$足以达到最佳的美元( sqrt{T} $-regret 。 我们大幅改进了这一现状,并显示,只要每批发一份样本就足以实现接近最佳的美元( sqrt{T} 。 标本的分发量,同时,在抽样中仍然稳健地进行。