When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution calibration tends to be infeasible, requiring sample complexity exponential in the number of classes $C$. In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers. When all possible decision makers are under consideration, decision calibration is the same as distribution calibration. However, when we only consider decision makers choosing between a bounded number of actions (e.g. polynomial in $C$), our main result shows that decisions calibration becomes feasible -- we design a recalibration algorithm that requires sample complexity polynomial in the number of actions and the number of classes. We validate our recalibration algorithm empirically: compared to existing methods, decision calibration improves decision-making on skin lesion and ImageNet classification with modern neural network predictors.
翻译:当面临不确定性时,决策者希望得到他们可以信任的预测。一个机器学习提供者可以通过保证他们的预测得到校准,向决策者传达信心。一个机器学习提供者可以保证他们的预测得到校准 -- -- 在接受预测的类别概率矢量的输入中,实际的分类分布是$q美元。然而,对于多级预测问题,实现分配校准往往不可行,需要分类数量的样本复杂指数。在这项工作中,我们引入了一个新的概念 -- -- \emph{Decision校准} -- -- 需要预测的分布和真实的分布是“无法分辨的”到一组下游决策者。当所有可能的决策者被考虑时,决定校准与分配校准是一样的。然而,当我们只考虑决策者在受约束的行动数量(例如,以美元计算多数值)之间作出选择时,我们的主要结果表明,调整决定是可行的 -- 我们设计了一种校正的算算算法,要求在行动和班级数中,要求样本的复杂度的分布和真实分布是“不可分化的”到一组决策者。当所有可能的决策者都考虑时,决定校准与分配的校准是相同的。当我们校准了对的模型的模型的模型,比测测测测测测算:比现有的网络的模型和测算法。