To increase brand awareness, many advertisers conclude contracts with advertising platforms to purchase traffic and then deliver advertisements to target audiences. In a whole delivery period, advertisers usually desire a certain impression count for the ads, and they also expect that the delivery performance is as good as possible (e.g., obtaining high click-through rate). Advertising platforms employ pacing algorithms to satisfy the demands via adjusting the selection probabilities to traffic requests in real-time. However, the delivery procedure is also affected by the strategies from publishers, which cannot be controlled by advertising platforms. Preloading is a widely used strategy for many types of ads (e.g., video ads) to make sure that the response time for displaying after a traffic request is legitimate, which results in delayed impression phenomenon. Traditional pacing algorithms cannot handle the preloading nature well because they rely on immediate feedback signals, and may fail to guarantee the demands from advertisers. In this paper, we focus on a new research problem of impression pacing for preloaded ads, and propose a Reinforcement Learning To Pace framework RLTP. It learns a pacing agent that sequentially produces selection probabilities in the whole delivery period. To jointly optimize the two objectives of impression count and delivery performance, RLTP employs tailored reward estimator to satisfy the guaranteed impression count, penalize the over-delivery and maximize the traffic value. Experiments on large-scale industrial datasets verify that RLTP outperforms baseline pacing algorithms by a large margin. We have deployed the RLTP framework online to our advertising platform, and results show that it achieves significant uplift to core metrics including delivery completion rate and click-through rate.
翻译:为了提高品牌意识,许多广告商与广告平台签订合同,购买交通,然后向目标受众发送广告。在整个交付期内,广告商通常希望对广告进行一定的印象计价,他们还期望交付业绩尽可能好(例如,获得高点击率)。广告平台使用节奏算法,通过调整选择概率以满足实时交通请求,满足需求。然而,交付程序也受到出版商战略的影响,这些战略无法通过广告平台加以控制。预先装载是许多类型的广告(例如,视频标价广告)广泛使用的一种战略,以确保在发送要求后显示的响应时间尽可能合理,从而导致延迟的印象现象。传统的节奏算法无法处理预装载性质,因为它们依赖即时反馈信号,可能无法保证广告商的需求。在本文中,我们侧重于一个新的研究问题,即对预先装载的广告进行准确度的印象计价,并提议对在线框架进行强化的精度里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程。它到里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程里程