Learning to Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real or synthetically generated) assigned to document-topic pairs instead of single-valued relevance labels. We propose five new probabilistic loss functions to deal with the higher expressive power provided by relevance judgments distributions and show how they can be applied both to neural and GBM architectures. Moreover, we show how training a LETOR model on a sampled version of the relevance judgments from certain probability distributions can improve its performance when relying either on traditional or probabilistic loss functions. Finally, we validate our hypothesis on real-world crowdsourced relevance judgments distributions. Overall, we observe that relying on relevance judgments distributions to train different LETOR models can boost their performance and even outperform strong baselines such as LambdaMART on several test collections.
翻译:学习排名( LETOR) 算法通常在附加注释的Corpora (LETOR) 上接受培训, 每一个可用的文档专题配对都配有单一的相关标签。 在 Cranfield 框架内, 关联标签的产生是因为将多种专家整理的人类评估或众源评估合并在一起。 在本文中, 我们探索如何培训LETOR 模型, 配有用于文档专题配对( 无论是真实的还是合成生成的) 的相关判断分布, 而不是单一价值的关联标签。 我们提议了五个新的概率损失功能, 以处理相关性判断分布所提供的更高表达力, 并展示它们如何同时适用于神经和 GBM 结构。 此外, 我们展示了在依赖传统或概率分布的概率分配中, 如何用 LETOR 模型的样本化模型来培训其性能。 最后, 我们验证了我们关于真实世界人群组合关联值相关判断分布的假设。 总体而言, 我们观察到, 依赖关联性判断分布来培训不同的 LETOR 模型可以提高它们的性, 甚至超越一些测试收藏的强基线, 例如 LambdaMART 。