Twitter广告推荐的高性能集成候选生成方法TwERC (TwERC: High Performance Ensembled Candidate Generation for Ads Recommendation at Twitter)

Vanessa Cai,Pradeep Prabakar,Manuel Serrano Rebuelta,Lucas Rosen,Federico Monti,Katarzyna Janocha,Tomo Lazovich,Jeetu Raj,Yedendra Shrinivasan,Hao Li,Thomas Markovich

from arxiv, 10 pages, 3 figures

Recommendation systems are a core feature of social media companies with their uses including recommending organic and promoted contents. Many modern recommendation systems are split into multiple stages - candidate generation and heavy ranking - to balance computational cost against recommendation quality. We focus on the candidate generation phase of a large-scale ads recommendation problem in this paper, and present a machine learning first heterogeneous re-architecture of this stage which we term TwERC. We show that a system that combines a real-time light ranker with sourcing strategies capable of capturing additional information provides validated gains. We present two strategies. The first strategy uses a notion of similarity in the interaction graph, while the second strategy caches previous scores from the ranking stage. The graph based strategy achieves a 4.08% revenue gain and the rankscore based strategy achieves a 1.38% gain. These two strategies have biases that complement both the light ranker and one another. Finally, we describe a set of metrics that we believe are valuable as a means of understanding the complex product trade offs inherent in industrial candidate generation systems.

翻译：推荐系统是社交媒体公司的核心特色，其用途包括推荐有机和推广内容。现代推荐系统通常被分为多个阶段-候选生成和重排序以平衡计算成本和推荐质量。本文关注大规模广告推荐问题中的候选生成阶段，并提出了一种机器学习架构TwERC，这是一个异构的重新架构。我们展示了一个系统，它将实时轻量级排序器与能够捕捉到额外信息的采集策略相结合，从而提供了验证增益。我们提出了两种策略。第一种策略使用交互图中的相似性概念，而第二种策略缓存了排序阶段的先前分数。基于图的策略实现了4.08％的收入增益，而基于分数的策略则实现了1.38％的增益。这两种策略有着互补的偏见，可以补充轻量级排序器和彼此。最后，我们描述了一组指标，我们认为这些指标有助于理解工业候选生成系统中的复杂产品权衡。