Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, among which a widely adopted one is the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, usually presenting a see-saw effect between constraint satisfaction and objective optimization. Existing solutions to the constraint-objective trade-off are typically established in static or mildly changing markets. However, these methods fail significantly in non-stationary advertising markets due to their inability to adapt to varying dynamics and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, we propose the first hard barrier solution to accommodate non-monotonic constraints. Our method exploits a parameter-free indicator-augmented reward function and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary advertising markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys outstanding stability.
翻译:在现代在线广告系统中,实时竞标(RTB)是一个重要的机制。广告商在RTB中采用招标策略,在各种金融要求下优化广告效果,其中广泛采用的一种是投资回报限制。ROI在连续招标过程中非单调变化,通常在限制满意度和客观优化之间产生视觉效应。限制目标交易的现有解决方案通常是在静态或轻微变化的市场中确立的。然而,这些方法在非静止广告市场中严重失灵,因为它们无法适应不同的动态和部分可耐性。在这项工作中,我们专门致力于ROI在非静止市场中受限制的投标。根据部分可观的 Constraced Markov 决策程序,我们提出第一个硬性障碍解决方案,以适应非流动性限制的制约性制约性制约性制约性制约性制约性制约性制约性制约性制约性影响。我们的方法是利用一个无参数指标强化奖励功能,并开发一个课程指导性Bayesian加强学习(CBRL)框架,以适应性控制在不固定市场中受限制性限制性贸易约束性约束性约束性约束性选择的非静止市场,在不稳定的工业稳定性广告市场中进行大规模数据升级。