We introduce Learn then Test (LTT), a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees regardless of the underlying model and (unknown) data-generating distribution. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. To accomplish this, we solve a key technical challenge: the control of arbitrary risks that are not necessarily monotonic. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
翻译:我们引入了 " 学习后测试 " (LTT)这个校准机器学习模型的框架,以便它们的预测能满足明确的、有限的统计保证,而不论基本模型和(未知的)数据生成分布。框架除其他外,涉及多标签分类中的虚假发现率控制、例分解中的交叉连接控制,以及同时控制外部检测和分类或回归中信任度设定的一型错误。为了实现这一目标,我们解决了一个关键的技术挑战:控制不一定是单调的任意风险。我们的主要见解是重新定义风险控制问题,将之作为不同于以往文献的多重假设测试、赋能技术和数学参数。我们利用我们的框架为若干核心机器学习任务提供新的校准方法,并附有计算机愿景中的详细工作实例。