Artificial intelligence(AI)-assisted method had received much attention in the risk field such as disease diagnosis. Different from the classification of disease types, it is a fine-grained task to classify the medical images as benign or malignant. However, most research only focuses on improving the diagnostic accuracy and ignores the evaluation of model reliability, which limits its clinical application. For clinical practice, calibration presents major challenges in the low-data regime extremely for over-parametrized models and inherent noises. In particular, we discovered that modeling data-dependent uncertainty is more conducive to confidence calibrations. Compared with test-time augmentation(TTA), we proposed a modified Bootstrapping loss(BS loss) function with Mixup data augmentation strategy that can better calibrate predictive uncertainty and capture data distribution transformation without additional inference time. Our experiments indicated that BS loss with Mixup(BSM) model can halve the Expected Calibration Error(ECE) compared to standard data augmentation, deep ensemble and MC dropout. The correlation between uncertainty and similarity of in-domain data is up to -0.4428 under the BSM model. Additionally, the BSM model is able to perceive the semantic distance of out-of-domain data, demonstrating high potential in real-world clinical practice.
翻译:在疾病诊断等风险领域,人工智能(AI)辅助方法在疾病诊断等风险领域受到极大关注。不同于疾病类型分类,将医疗图像分类为良性或恶性,是一项细微的任务。然而,大多数研究的重点只是提高诊断准确度,忽视模型可靠性的评价,这限制了临床应用。临床实践方面,校准在低数据制度方面提出了重大挑战,因为过度平衡模型和内在噪音极易成为低数据系统的主要挑战。特别是,我们发现,基于数据的不确定性建模更有利于信心校准。与测试-时间增强(TTA)相比,我们建议采用经修改的增强数据增强战略,调整的启动损失(BS)功能,该功能可以更好地校准预测不确定性和捕捉数据分布变化,而无需额外的推导时间。我们的实验表明,与Mixup(BSM)模型相比,预期校准误差率(ECEE)可减半标准数据增强、深层混合和MC的退出。与标准时间校准校准值校准数据之间的不确定性和相似性数据的相关性,与测试-时间增强(BS)数据升级模型的距离在BSM-SM中,能够显示SM的SM的SM中。