In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's uncertainty with respect to its own scoring of a document is a critical aspect of retrieval that allows for greater use of current models across new document distributions, collections, or even improving effectiveness for down-stream tasks. In this paper, we address this problem via an efficient Bayesian framework for retrieval models which captures the model's belief in the relevance score through a stochastic process while adding only negligible computational overhead. We evaluate this belief via a ranking based calibration metric showing that our approximate Bayesian framework significantly improves a retrieval model's ranking effectiveness through a risk aware reranking as well as its confidence calibration. Lastly, we demonstrate that this additional uncertainty information is actionable and reliable on down-stream tasks represented via cutoff prediction.
翻译:在任何排名系统中,检索模型根据其对某一搜索查询的相关性的信念,为某一文件产生单一分数。虽然检索模型随着采用日益复杂的结构而继续改进,但很少有作品调查检索模型对超出单一价值范围的得分的信念。我们争辩说,捕捉该模型在其本身的得分方面的不确定性是检索的一个重要方面,它使得能够更多使用当前模型,跨越新的文件分发、收集,甚至提高下游任务的效力。在本文中,我们通过一个高效的贝叶西亚检索模型框架来解决这一问题,该框架通过随机化过程捕捉该模型对相关得分的信念,同时只增加微不足道的计算间接费用。我们通过一个基于等级的校准指标来评估这一信念,表明我们大约的贝叶西亚框架通过有风险的重新排序及其信心校准,大大提高了检索模型的排名效力。最后,我们证明这一额外的不确定信息在通过截断预测代表的下游任务上是可采取行动和可靠的。