Transformers, currently the state-of-the-art in natural language understanding (NLU) tasks, are prone to generate uncalibrated predictions or extreme probabilities, making the process of taking different decisions based on their output relatively difficult. In this paper we propose to build several inductive Venn--ABERS predictors (IVAP), which are guaranteed to be well calibrated under minimal assumptions, based on a selection of pre-trained transformers. We test their performance over a set of diverse NLU tasks and show that they are capable of producing well-calibrated probabilistic predictions that are uniformly spread over the [0,1] interval -- all while retaining the original model's predictive accuracy.
翻译:目前,在自然语言理解(NLU)任务方面最先进的变异器很容易产生未经校准的预测或极端概率,使得根据产出作出不同决定的过程相对困难。在本文中,我们提议建立若干感性Venn-ABERS预测器(IVAP),这些预测器保证在最低假设下根据经过预先训练的变异器选择的精密校准。我们测试了他们对一套不同的NLU任务的性能,并表明他们有能力产生经充分校准的概率预测,这些预测均分布在[0]间隔内,同时保留原始模型的预测准确性。