Deep learning models for medical image segmentation can fail unexpectedly and spectacularly for pathological cases and images acquired at different centers than training images, with labeling errors that violate expert knowledge. Such errors undermine the trustworthiness of deep learning models for medical image segmentation. Mechanisms for detecting and correcting such failures are essential for safely translating this technology into clinics and are likely to be a requirement of future regulations on artificial intelligence (AI). In this work, we propose a trustworthy AI theoretical framework and a practical system that can augment any backbone AI system using a fallback method and a fail-safe mechanism based on Dempster-Shafer theory. Our approach relies on an actionable definition of trustworthy AI. Our method automatically discards the voxel-level labeling predicted by the backbone AI that violate expert knowledge and relies on a fallback for those voxels. We demonstrate the effectiveness of the proposed trustworthy AI approach on the largest reported annotated dataset of fetal MRI consisting of 540 manually annotated fetal brain 3D T2w MRIs from 13 centers. Our trustworthy AI method improves the robustness of a state-of-the-art backbone AI for fetal brain MRIs acquired across various centers and for fetuses with various brain abnormalities.
翻译:医疗图象分解的深度学习模式可能会意外地失败,而且对于不同中心获得的病理案例和图像的深度学习模式,除了培训图象之外,也会发生惊人的意外和惊人的失败,其标签错误会破坏专家知识的错误。这种错误会破坏医学图象分解的深层次学习模式的可信度。检测和纠正这种失败的机制对于安全地将这种技术转化为诊所至关重要,并且很可能是未来人工智能条例的要求(AI)。在这项工作中,我们提出了一个可靠的AI理论框架和实用系统,可以使用退步法和基于Dempster-Shafer理论的防故障机制来增强任何骨干AI系统。我们的方法依赖于可操作的可靠AI定义。我们的方法自动抛弃了由主干AI预测的、违反专家知识并依赖这些恶性毒剂的Voxel等级标签。我们展示了拟议可靠的人工智能方法的有效性,该方法由来自13个中心的540个手动注解的胎儿脑脑3D T2w MMSIs组成。我们可靠的AI方法可以改进了由各种正常的大脑获得的中心和胚骨架骨架结构的坚固性。