Deep learning has the potential to augment many components of the clinical workflow, such as medical image interpretation. However, the translation of these black box algorithms into clinical practice has been marred by the relative lack of transparency compared to conventional machine learning methods, hindering in clinician trust in the systems for critical medical decision-making. Specifically, common deep learning approaches do not have intuitive ways of expressing uncertainty with respect to cases that might require further human review. Furthermore, the possibility of algorithmic bias has caused hesitancy regarding the use of developed algorithms in clinical settings. To these ends, we explore how conformal methods can complement deep learning models by providing both clinically intuitive way (by means of confidence prediction sets) of expressing model uncertainty as well as facilitating model transparency in clinical workflows. In this paper, we conduct a field survey with clinicians to assess clinical use-cases of conformal predictions. Next, we conduct experiments with a mammographic breast density and dermatology photography datasets to demonstrate the utility of conformal predictions in "rule-in" and "rule-out" disease scenarios. Further, we show that conformal predictors can be used to equalize coverage with respect to patient demographics such as race and skin tone. We find that a conformal predictions to be a promising framework with potential to increase clinical usability and transparency for better collaboration between deep learning algorithms and clinicians.
翻译:深层学习有可能增加临床工作流程的许多组成部分,如医学图像判读等。然而,将这些黑盒算法转换成临床实践,由于与传统机器学习方法相比相对缺乏透明度,妨碍了临床对关键医疗决策系统的信任,妨碍了临床对临床关键医疗决策系统的信任。具体地说,共同深层学习方法没有直观的方法来表达可能需要进一步进行人类审查的案件的不确定性。此外,算法偏向的可能性导致在临床环境中使用发达算法方面的偏执。为了这些目的,我们探索如何使这些符合的方法能够补充深层学习模型,通过提供临床直观的方式(通过信心预测组)表达模型不确定性并促进临床工作流程的模型透明度。在本论文中,我们与临床医生进行实地调查,以评估临床使用情况,对可能需要进一步进行人类审查的案例进行评估。我们用乳房X光密度和皮肤摄影数据集进行实验,以展示在“正常应用”和“正常退出”疾病假设中进行符合要求的预测的效用。此外,我们表明,符合逻辑的临床预测和临床预测的准确性,我们可以用一种更符合逻辑的逻辑框架来提高临床预测的准确性。我们学习,以便以研究更准确地预测。我们使用一种更精确地研究,从而发现,可以使用一种更接近的临床预测。