Recommendation systems are a core feature of social media companies with their uses including recommending organic and promoted contents. Many modern recommendation systems are split into multiple stages - candidate generation and heavy ranking - to balance computational cost against recommendation quality. We focus on the candidate generation phase of a large-scale ads recommendation problem in this paper, and present a machine learning first heterogeneous re-architecture of this stage which we term TwERC. We show that a system that combines a real-time light ranker with sourcing strategies capable of capturing additional information provides validated gains. We present two strategies. The first strategy uses a notion of similarity in the interaction graph, while the second strategy caches previous scores from the ranking stage. The graph based strategy achieves a 4.08% revenue gain and the rankscore based strategy achieves a 1.38% gain. These two strategies have biases that complement both the light ranker and one another. Finally, we describe a set of metrics that we believe are valuable as a means of understanding the complex product trade offs inherent in industrial candidate generation systems.
翻译:推荐系统是社交媒体公司的核心特色,其用途包括推荐有机和推广内容。现代推荐系统通常被分为多个阶段-候选生成和重排序以平衡计算成本和推荐质量。本文关注大规模广告推荐问题中的候选生成阶段,并提出了一种机器学习架构TwERC,这是一个异构的重新架构。我们展示了一个系统,它将实时轻量级排序器与能够捕捉到额外信息的采集策略相结合,从而提供了验证增益。我们提出了两种策略。第一种策略使用交互图中的相似性概念,而第二种策略缓存了排序阶段的先前分数。基于图的策略实现了4.08%的收入增益,而基于分数的策略则实现了1.38%的增益。这两种策略有着互补的偏见,可以补充轻量级排序器和彼此。最后,我们描述了一组指标,我们认为这些指标有助于理解工业候选生成系统中的复杂产品权衡。