Designing data sharing mechanisms providing performance and strong privacy guarantees is a hot topic for the Online Advertising industry. Namely, a prominent proposal discussed under the Improving Web Advertising Business Group at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays. To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21, a premier workshop on Advertising Science with data provided by advertising company Criteo. In this paper, we describe the challenge tasks, the structure of the available datasets, report the challenge results, and enable its full reproducibility. A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap. We also run additional experiments to observe the sensitivity of winning methods to different parameters such as privacy budget or quantity of available privileged side information. We conclude that the industry needs either alternate designs for private data sharing or a breakthrough in learning with aggregated data only to keep ad relevance at a reasonable level.
翻译:在线广告业的一个热门话题是设计提供业绩和强力隐私保障的数据分享机制。 也就是说,W3C改进网络广告商业集团下讨论的一项突出提案,仅允许通过对过去展示的汇总、差别化的私人报告分享广告信号。 为了广泛研究这一建议,在AdKDD'21(广告公司Criteo提供的数据的广告科学第一讲习班)上举行了公开的隐私保护机器学习挑战。我们在本文件中描述了挑战任务、现有数据集的结构、报告挑战结果和使其完全可复制。一项关键发现是,在一小套未汇总的数据点出现的情况下,大规模综合数据的学习模式可能令人惊讶地高效和廉价。我们还进行了更多的实验,以观察获胜方法对隐私预算或现有特权侧信息数量等不同参数的敏感性。我们的结论是,该行业需要替代的私人数据共享设计,或者在学习方面实现突破,只有以综合数据保持合理的相关性。