A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting. We validate this approach in the binary pneumonia detection task of Rajpurkar, Irvin, Zhu, et al. (2017).
翻译:一个经过大量研究的问题是机器学习算法所提供的信任分数在多大程度上被校准为真实概率的基础。 我们的出发点是校准似乎与班级加权不相容,当一个班级不太常见(阶级不平衡)或希望实现某种外部目标(成本敏感学习)时,通常使用这种技术。 我们对这种不兼容性提供了基于模型的解释,并使用我们的人类形态模型来产生一个简单的方法,从因班级加权而错误校准的算法中恢复可能性。 我们在拉杰普尔卡尔、伊尔文、朱等人(2017年)的二进制肺炎检测任务中验证了这一方法。