Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.
翻译:在线广告通常比离线广告更具个性化,通过使用机器学习模型和实时拍卖来进行广告定向。 一项具体的任务,即预测转换的可能性(即用户购买广告产品的概率)对于广告生态系统的针对性和定价广告至关重要。 目前,这些模型通常通过观察个人用户行为来培训,但越来越多的监管和技术限制要求采取隐私保护方法。例如,主要平台正在限制跟踪多个应用程序中的单个用户事件,而世界各国政府在监管个人数据的使用方面表现出了越来越多的兴趣。一项具体的任务,即预测转换的可能性(即用户购买广告产品的概率 ), 对于定位和定价广告广告对广告生态系统的定位至关重要。 目前,这些模型往往通过观察个人用户的行为来培训,但是,监管和技术制约因素越来越要求采取隐私保护方法。例如,主要平台正在限制跟踪多个应用程序中的用户事件,而世界各国政府则越来越有兴趣监管个人数据的使用。 我们用离线实验来了解个人用户行为,而不是接收隐私保护反馈,例如安装了一组用户制作的广告应用程序的数量。我们从机器学习的角度概述了网上广告生态系统中最近发生的与隐私有关的变化。我们概述了在学习这一环境下学习转换模型时遇到的挑战和制约。 我们采用一种新的方法来培训这些模型来使用后级信号。 我们用这些模型来显示这些模型使用后台信号。 我们用在实际世界数据进行演示使用离线实验,在选择了自己在最后选择了自己选择了在选择了在最后的模型时,我们最后的模型。