In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing high-level campaign objectives and constraints. Previous works designed auto-bidding tools from the view of single-agent, without modeling the mutual influence between agents. In this paper, we instead consider this problem from a distributed multi-agent perspective, and propose a general $\underline{M}$ulti-$\underline{A}$gent reinforcement learning framework for $\underline{A}$uto-$\underline{B}$idding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose a temperature-regularized credit assignment to establish a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among agents, we can reach an equilibrium state that guarantees not only individual advertiser's utility but also the system performance (i.e., social welfare). Second, to avoid the potential collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then alleviate the revenue degradation due to the cooperation. Third, to deploy MAAB in the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among the large-scale advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and revenue.
翻译:在网上广告中,自动招标已成为广告商通过仅仅表达高水平竞选目标和限制来优化其首选的绩效衡量标准的基本工具。 以前的作品设计了从单一代理商的观点出发的自动招标工具,而没有模拟代理商之间的相互影响。 在本文中,我们从分布式多代理商的角度来考虑这一问题,并提议一个通用的$underline{M}$-$-underline{A}$xUT-$_unut-$underline{B}$idline{B}$(MAAB)来优化其首选的绩效衡量标准。 其次,为了避免低价竞标的潜在串通,即MAAB来学习自动招标战略。 首先,我们调查汽车竞标商之间的竞争与合作关系,并提议一个温度正规化的信用分配,以建立混合的合作竞争模式。 通过仔细地在代理商之间开展竞争与合作交易,我们可以达到一个平衡状态,不仅保证个人广告商的效用,而且保证系统绩效(即社会福利)。 其次,为了避免低价竞标的潜在风险交易。 我们进一步建议巴代理商之间通过个人化的投标, 使每个代理商之间进行个人化的广告交易, 以大规模的汇率交易交易, 将每个代理商的货币税制 向一个大规模的收益税 向一个比例化的变化的升级法。