关于将神经学习校准和不确定性与排名模型的校准和不确定性 (On the Calibration and Uncertainty of Neural Learning to Rank Models)

According to the Probability Ranking Principle (PRP), ranking documents in decreasing order of their probability of relevance leads to an optimal document ranking for ad-hoc retrieval. The PRP holds when two conditions are met: [C1] the models are well calibrated, and, [C2] the probabilities of relevance are reported with certainty. We know however that deep neural networks (DNNs) are often not well calibrated and have several sources of uncertainty, and thus [C1] and [C2] might not be satisfied by neural rankers. Given the success of neural Learning to Rank (L2R) approaches-and here, especially BERT-based approaches-we first analyze under which circumstances deterministic, i.e. outputs point estimates, neural rankers are calibrated. Then, motivated by our findings we use two techniques to model the uncertainty of neural rankers leading to the proposed stochastic rankers, which output a predictive distribution of relevance as opposed to point estimates. Our experimental results on the ad-hoc retrieval task of conversation response ranking reveal that (i) BERT-based rankers are not robustly calibrated and that stochastic BERT-based rankers yield better calibration; and (ii) uncertainty estimation is beneficial for both risk-aware neural ranking, i.e.taking into account the uncertainty when ranking documents, and for predicting unanswerable conversational contexts.

翻译：根据概率定级原则(PPRP),排序文件按其相关性概率的排序顺序排列,可得出最佳的文件排序,以便进行临时检索。考虑到神经学习(L2R)方法以及这里的方法的成功,特别是基于BERT的方法——我们首先分析确定性环境,即产出点估计,对神经定级者进行校准。然后,根据我们的调查结果,我们利用两种技术模拟导致拟议随机分级器的神经定级器的不确定性,从而得出相关性的预测性分布,而不是点估计。我们关于谈话响应排序的实验性结果显示(i)基于ERT的定级和基于BERT的对等方法——我们首先分析确定性环境,即产出点估计,以及[C2]相关概率的概率。然后,我们根据我们的调查结果,使用两种技术来模拟导致拟议随机定级器的不确定性,从而得出与点估定值相对的可预测性相关性的预测性分布。我们在对谈话排序的检索任务中得出的实验结果显示(i)基于BERT的定级和对准性定级环境的定级者进行不稳健的定级,对准性评估,对准性文件进行精确性估算,对准,对准和准确性评估。