When building recommendation systems, we seek to output a helpful set of items to the user. Under the hood, a ranking model predicts which of two candidate items is better, and we must distill these pairwise comparisons into the user-facing output. However, a learned ranking model is never perfect, so taking its predictions at face value gives no guarantee that the user-facing output is reliable. Building from a pre-trained ranking model, we show how to return a set of items that is rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate (FDR), regardless of the (unknown) data distribution. Moreover, our calibration algorithm enables the easy and principled integration of multiple objectives in recommender systems. As an example, we show how to optimize for recommendation diversity subject to a user-specified level of FDR control, circumventing the need to specify ad hoc weights of a diversity loss against an accuracy loss. Throughout, we focus on the problem of learning to rank a set of possible recommendations, evaluating our methods on the Yahoo! Learning to Rank and MSMarco datasets.
翻译:当建立推荐系统时, 我们试图向用户输出一组有用的项目。 在引擎盖下, 排名模型预测两个候选项目中的哪个比较更好, 我们必须将这些对称比较提炼到显示用户的输出中。 但是, 学习的排名模型从来不完美, 所以按面值作出预测并不能保证用户显示的输出是可靠的。 从预先培训的排名模型中, 我们展示了如何返回一组严格保证含有大部分好项目的物品。 我们的程序赋予了任何对虚假发现率( FDR)进行严格限量控制的排名模型, 不论( 未知的) 数据分布 。 此外, 我们的校准算法使得推荐系统中多个目标的简单和有原则的整合成为推荐者系统。 例如, 我们展示了如何优化推荐的多样性, 取决于用户指定的FDR控制级别, 避免了确定多样性损失的特定权重和准确性损失。 总体来说, 我们侧重于学习一套可能的排序问题, 评估我们在 Yahoo 上的方法; 学习到 Rang 和 MSMarco 数据集 。