Image classification models deployed in the real world may receive inputs outside the intended data distribution. For critical applications such as clinical decision making, it is important that a model can detect such out-of-distribution (OOD) inputs and express its uncertainty. In this work, we assess the capability of various state-of-the-art approaches for confidence-based OOD detection through a comparative study and in-depth analysis. First, we leverage a computer vision benchmark to reproduce and compare multiple OOD detection methods. We then evaluate their capabilities on the challenging task of disease classification using chest X-rays. Our study shows that high performance in a computer vision task does not directly translate to accuracy in a medical imaging task. We analyse factors that affect performance of the methods between the two tasks. Our results provide useful insights for developing the next generation of OOD detection methods.
翻译:在现实世界中部署的图像分类模型可能会在预期数据分布之外得到投入。对于临床决策等关键应用,重要的是模型能够检测出这种分配外(OOD)输入并表达其不确定性。在这项工作中,我们通过比较研究和深入分析,评估各种基于信任的OOD检测最新方法的能力。首先,我们利用计算机的视觉基准复制和比较多种OOD检测方法。然后,我们用胸X光来评估其在具有挑战性的疾病分类任务方面的能力。我们的研究显示,计算机愿景任务中的高性能不会直接转化为医疗成像任务的准确性。我们分析影响两种任务之间方法绩效的因素。我们的成果为开发下一代OOD检测方法提供了有益的见解。