Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare, where accurate expressions of uncertainty are essential for reliability and trust. However, current LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence". Recent efforts have focused on calibrating LLMs' verbalized confidence: i.e., their expressions of confidence in text form, such as "I am 80% confident that...". Existing approaches either rely on prompt engineering or fine-tuning with heuristically generated uncertainty estimates, both of which have limited effectiveness and generalizability. Motivated by the notion of proper scoring rules for calibration in classical machine learning models, we introduce ConfTuner, a simple and efficient fine-tuning method that introduces minimal overhead and does not require ground-truth confidence scores or proxy confidence estimates. ConfTuner relies on a new loss function, tokenized Brier score, which we theoretically prove to be a proper scoring rule, intuitively meaning that it "correctly incentivizes the model to report its true probability of being correct". ConfTuner improves calibration across diverse reasoning tasks and generalizes to black-box models such as GPT-4o. Our results further show that better-calibrated confidence enables downstream gains in self-correction and model cascade, advancing the development of trustworthy LLM systems. The code is available at https://github.com/liushiliushi/ConfTuner.
翻译:大型语言模型(LLMs)正越来越多地部署于科学、法律和医疗等高风险领域,在这些领域中,准确表达不确定性对于可靠性和信任至关重要。然而,当前的大型语言模型常被观察到以高置信度生成错误答案,这种现象被称为“过度自信”。近期的研究致力于校准大型语言模型的言语化置信度:即它们在文本形式中表达的置信度,例如“我有80%的把握认为……”。现有方法要么依赖于提示工程,要么依赖于使用启发式生成的不确定性估计进行微调,这两种方法的有效性和泛化能力均有限。受经典机器学习模型中校准的适当评分规则概念启发,我们提出了ConfTuner,这是一种简单高效的微调方法,引入的开销极小,且不需要真实置信度分数或代理置信度估计。ConfTuner依赖于一种新的损失函数——标记化Brier分数,我们从理论上证明它是一种适当评分规则,直观上意味着它“正确激励模型报告其正确答案的真实概率”。ConfTuner在多种推理任务中改善了校准效果,并能泛化至如GPT-4o等黑盒模型。我们的结果进一步表明,更好的校准置信度能够在自我修正和模型级联中带来下游收益,从而推动可信赖的大型语言模型系统的发展。代码可在https://github.com/liushiliushi/ConfTuner获取。