Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, especially the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, and often induce a see-saw effect between constraint satisfaction and objective optimization. While some existing approaches show promising results in static or mildly changing ad markets, they fail to generalize to highly non-stationary ad markets with ROI constraints, due to their inability to adaptly balance constraints and objectives amidst non-stationarity and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, our method exploits an indicator-augmented reward function free of extra trade-off parameters and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary ad markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys superior learning efficiency and stability.
翻译:在现代在线广告系统中,实时竞标(RTB)是一个重要的机制。广告商在RTB中采用投标战略,在各种金融要求,特别是投资回报限制下,优化广告效果,优化广告效果,在连续招标过程中,ROI改变非单调,往往在限制满意度和客观优化之间产生视觉效果。虽然一些现有方法显示在静态或轻微变化的市场中取得了令人乐观的结果,但是没有将一些现有方法推广到高度非固定的有ROI限制的市场,因为他们无法适应在不固定和部分可耐性的情况下,平衡限制和目标。在这项工作中,我们专门致力于ROI在非静态市场中受限制的投标。基于部分可受约束的Markov决定程序,我们的方法利用了无额外交易参数的指标强化奖励功能,开发了课程指导的Bayesian加强学习(CBRL)框架,以适应性地控制非固定状态和部分可耐受性约束的贸易限制。在非静止市场中,我们专门从事ROI- Conndateddatedting in laximal ladistrual lax ladition ladistrual ladistrage,在大规模的工业效率上展示了大规模数据升级。