Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions always misinform user decisions. To investigate these issues, we present an efficient calibration and uncertainty estimation framework PG-DRR for dialog response retrieval models which adds a Gaussian Process layer to a deterministic deep neural network and recovers conjugacy for tractable posterior inference by P\'{o}lya-Gamma augmentation. Finally, PG-DRR achieves the lowest empirical calibration error (ECE) in the in-domain datasets and the distributional shift task while keeping $R_{10}@1$ and MAP performance.
翻译:深神经检索模型已经充分展示了它们的力量,但估计其预测的可靠性仍然具有挑战性。大多数对话框响应检索模型都输出一个分数来回答它与某个问题的相关性。然而,深神经网络的错误校准导致单一分数的各种不确定性,因此不可靠的预测总是错误地反映用户的决定。为了调查这些问题,我们提出了一个高效的校准和不确定性估计框架PG-DRR,用于对话响应检索模型,该模型将高斯进程层添加到一个确定性的深神经网络中,并恢复了P\'{o}lya-Gamma 递增可移动的后传推力的相似性。最后,PG-DRR在内部数据集和分布转移任务中实现了最低的实验性校准错误(欧洲经委会),同时保持$R<unk> 1美元和MAP的性能。</s>