Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches
翻译:网上实时竞价(RTB)是一个复杂的拍卖游戏,其中广告商在用户提出要求时努力争取给人留下印象。考虑到显示成本、投资回报(ROI)和其他有影响力的关键业绩指标(KPIs),大型广告平台试图平衡动态中不同目标之间的权衡。为了应对这一挑战,我们提议了一个基于强化学习(RL)的多点ObjecTive Actor-Critictrical算法(RL),名为MoTiAC,用于解决各种目标的投标优化问题。在MoTiAC中,目标特定代理机构以不同的目标和观点对全球网络进行同步更新,导致强有力的投标政策。与以前的RL模式不同,拟议的MotiAC可以同时在复杂的投标环境中完成多点任务。此外,我们从数学上证明我们的模型将汇集到Pareto最佳性。最后,从Tentent公司对大规模真实商业数据集的实验将核查MoTiAC相对于一套近期办法的有效性。