We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows state-of-the-art estimation of the user interests in the ubiquitous scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms competing approaches that use one signal or the other and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.
翻译:我们引入了概率排名和奖赏(PRR),这是个人化板块建议的可缩放概率模型。我们的方法允许对用户在无处不在的情景中的利益进行最先进的估计,即用户从一组K项中最多与一个项目进行互动。我们表明,通过将奖赏、用户是否成功地与板块和等级(在板块中选择的项目)进行互动,可以有效地了解一个板块成功的可能性。PRR的形成优于使用一个信号或另一个信号的竞争性方法,而且对于大型行动空间来说更具有可伸缩性。此外,PRR允许通过最大内部产品搜索(MIPS)快速交付建议,使之适合诸如计算广告等低纬度领域。