Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, especially the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, and often induce a see-saw effect between constraint satisfaction and objective optimization. While some existing approaches show promising results in static or mildly changing ad markets, they fail to generalize to highly dynamic ad markets with ROI constraints, due to their inability to adaptively balance constraints and objectives amidst non-stationarity and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, our method exploits an indicator-augmented reward function free of extra trade-off parameters and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary ad markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys superior learning efficiency and stability.
翻译:在现代在线广告系统中,实时竞标(RTB)是一个重要的机制。广告商在RTB中采用投标战略,在各种金融要求下优化广告效果,特别是投资回报限制。ROI在连续招标过程中非单调变化,常常在限制满意度和客观优化之间产生视觉效果。虽然一些现有方法在静态或温和变化的广告市场上显示出有希望的结果,但由于无法适应性地控制不固定和部分可耐性情况下的制约和目标,它们未能在ROI中推广高度动态的市场。在这项工作中,我们专门致力于ROI在非静止市场中受ROI约束的投标。基于部分可耐受控的 Constract Markov 决策程序,我们的方法利用了无额外交易参数的指标强化奖励功能,并开发了课程指导Bayesian强化学习(CBRBL)框架,以适应性地控制非固定和部分可耐性市场中的制约性贸易。在不固定市场中,我们专门从事ROI- Constending in laxal lax lax lair labal Procal lading apress laview ladical 和在大规模数据稳定和高级制度中展示了大规模数据稳定和高级数据升级制度。