We introduce Probabilistic Rank and Reward model (PRR), a scalable probabilistic model for personalized slate recommendation. Our model allows state-of-the-art estimation of user interests in the following ubiquitous recommender system scenario: A user is shown a slate of K recommendations and the user chooses at most one of these K items. It is the goal of the recommender system to find the K items of most interest to a user in order to maximize the probability that the user interacts with the slate. Our contribution is to show that we can learn more effectively the probability of the recommendations being successful by combining the reward - whether the slate was clicked or not - and the rank - the item on the slate that was selected. Our method learns more efficiently than bandit methods that use only the reward, and user preference methods that use only the rank. It also provides similar or better estimation performance to independent inverse-propensity-score methods and is far more scalable. Our method is state of the art in terms of both speed and accuracy on massive datasets with up to 1 million items. Finally, our method allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in extremely low latency domains such as computational advertising.
翻译:我们引入了概率排名和评分模型(PRR),这是个个性化板建议可以缩放的概率模型。我们的模型允许在下列无处不在的建议系统情景下对用户兴趣进行最先进的估计:一个用户展示了K建议板,用户在大多数K项目中选择了多数K项目。推荐者系统的目标是找到用户最感兴趣的K项目,以便最大限度地提高用户与标本互动的概率。我们的贡献是表明,我们可以通过将奖励组合来更有效地了解建议是否成功。我们的模型允许对标本上的项目进行最先进的估计。我们的方法比只使用奖项和仅使用标本的用户偏好方法学习得更有效率。它还为独立的反质量核心方法提供了相似或更好的估计性,并且更具有更大的可缩放性。我们的方法是,通过将奖项与最大幅度的标本和最低的域域进行最高级的搜索,我们的方法是快速地利用最低的域域进行快速的搜索。最后,我们的方法是快速地以最低的域为低的域进行定位。