The rapid advancement of large language models (LLMs) necessitates novel monetization strategies, among which LLM-native advertising has emerged as a promising paradigm by naturally integrating advertisement within LLM-generated responses. However, this paradigm fundamentally shifts the auction object from discrete ad slots to the distribution over LLM outputs, posing new challenges for designing auction mechanisms. Existing mechanisms for LLM-native advertising adopt frameworks that decouple auction and generation, which either ignore externalities or require multiple LLM inferences for ad allocation, rendering them impractical for industrial scenarios. To address these challenges, we propose LLM-Auction, which to the best of our knowledge is the first learning-based generative auction mechanism that integrates auction and LLM generation for LLM-native advertising. By formulating the allocation optimization as a preference alignment problem between LLM outputs and the mechanism's objective which reflects both advertisers' expected value and user experience, we introduce Iterative Reward-Preference Optimization (IRPO) algorithm that alternately optimizes the reward model and the LLM. This approach enables the LLM to inherently model allocation externalities without any extra inference cost. We further identify the allocation monotonicity and continuity of LLM-Auction, which allows us to prove that a simple first-price payment rule exhibits favorable incentive properties. Additionally, we design an LLM-as-a-judge simulation environment to facilitate large-scale data construction and enable comprehensive quantitative evaluation of the mechanism's performance. Extensive quantitative and qualitative experiments demonstrate that LLM-Auction significantly outperforms existing baselines in allocation efficiency, while achieving the desired mechanism properties.
翻译:大语言模型(LLMs)的快速发展催生了新颖的货币化策略,其中LLM原生广告作为一种有前景的范式,通过将广告自然地融入LLM生成的响应中而兴起。然而,该范式将拍卖对象从离散的广告位转变为LLM输出的概率分布,为拍卖机制设计带来了新挑战。现有LLM原生广告机制采用拍卖与生成解耦的框架,这些方法要么忽略外部性,要么需要多次LLM推理进行广告分配,导致其在工业场景中缺乏实用性。为应对这些挑战,我们提出了LLM-Auction——据我们所知,这是首个基于学习的生成式拍卖机制,将拍卖与LLM生成过程集成于LLM原生广告中。通过将分配优化问题构建为LLM输出与机制目标之间的偏好对齐问题(该目标同时反映广告主期望价值与用户体验),我们提出了迭代奖励-偏好优化(IRPO)算法,交替优化奖励模型与LLM。该方法使LLM能够内在地建模分配外部性,无需额外推理成本。我们进一步证明了LLM-Auction的分配单调性与连续性,从而论证了简单的一价支付规则具备良好的激励特性。此外,我们设计了基于LLM作为评判者的模拟环境,以支持大规模数据构建,并实现对机制性能的全面量化评估。大量定量与定性实验表明,LLM-Auction在分配效率上显著优于现有基线,同时实现了理想的机制特性。