We investigate the relationship between commonly considered notions of multiclass calibration and the calibration algorithms used to achieve these notions, leading to two broad contributions. First, we propose a new and arguably natural notion of top-label calibration, which requires the reported probability of the most likely label to be calibrated. Along the way, we highlight certain philosophical issues with the closely related and popular notion of confidence calibration. Second, we outline general 'wrapper' multiclass-to-binary (M2B) algorithms that can be used to achieve confidence, top-label, and class-wise calibration, using underlying binary calibration routines. Our wrappers can also be generalized to other notions of calibration, if required for certain practical applications. We instantiate these wrappers with the binary histogram binning (HB) algorithm, and show that the overall procedure has distribution-free calibration guarantees. In an empirical evaluation, we find that with the right M2B wrapper, HB performs significantly better than other calibration approaches. Code for this work has been made publicly available at https://github.com/aigen/df-posthoc-calibration.
翻译:我们调查了共同考虑的多级校准概念与用于实现这些概念的校准算法之间的关系,从而得出了两种广泛的贡献。首先,我们提出了一个新的和可以论证的关于顶级标签校准的自然概念,这要求报告最有可能校准的标签概率。在前进的道路上,我们强调某些哲学问题与互信校准的密切相关和流行的概念。第二,我们概述了通用的“包件”多级到二进制(M2B)算法,这些算法可以用来实现信任、顶级标签和低级校准,使用基本的二进制常规。如果某些实际应用需要的话,我们的包件也可以推广到其他校准概念。我们将这些包件与二进制的硬盘(HB)算法同步起来,并表明总体程序有免分配校准保证。在一项经验评估中,我们发现,与正确的M2B包运算法相比,HB比其他校准方法要好得多。这项工作的守则已在https://github.com/aigen/alib-lib-costard-costard-corationalation上公布。