Real-time bidding is the new paradigm of programmatic advertising. An advertiser wants to make the intelligent choice of utilizing a \textbf{Demand-Side Platform} to improve the performance of their ad campaigns. Existing approaches are struggling to provide a satisfactory solution for bidding optimization due to stochastic bidding behavior. In this paper, we proposed a multi-agent reinforcement learning architecture for RTB with functional optimization. We designed four agents bidding environment: three Lagrange-multiplier based functional optimization agents and one baseline agent (without any attribute of functional optimization) First, numerous attributes have been assigned to each agent, including biased or unbiased win probability, Lagrange multiplier, and click-through rate. In order to evaluate the proposed RTB strategy's performance, we demonstrate the results on ten sequential simulated auction campaigns. The results show that agents with functional actions and rewards had the most significant average winning rate and winning surplus, given biased and unbiased winning information respectively. The experimental evaluations show that our approach significantly improve the campaign's efficacy and profitability.
翻译:实时招标是方案广告的新范式。 广告商希望明智地选择使用 \ textbf{ Demand- Side Plater} 来提高广告运动的绩效。 现有方法由于随机招标行为,正努力为投标优化提供令人满意的解决方案。 在本文中,我们提出了多剂强化学习结构,以优化功能来为RTB提供。 我们设计了四个代理商投标环境: 三个基于Lagrange- 倍增优化功能的代理商和一个基线代理商( 没有任何功能优化属性 ) 首先, 已经为每个代理商指定了许多属性, 包括偏差或不公正的赢利概率、 Lagrange 乘数和点击通速率。 为了评估拟议的RTB战略的绩效, 我们展示了10次连续模拟拍卖运动的结果。 结果表明, 功能行动和奖励的代理商分别获得了最显著的平均赢利率和赢得盈余率, 因为信息有偏向且不偏向。 实验性评估显示, 我们的方法极大地提高了运动的功效和赢利。