Deep neural networks have achieved impressive performance in a wide variety of medical imaging tasks. However, these models often fail on data not used during training, such as data originating from a different medical centre. How to recognize models suffering from this fragility, and how to design robust models are the main obstacles to clinical adoption. Here, we present general methods to identify causes for model generalisation failures and how to circumvent them. First, we use $\textit{distribution-shifted datasets}$ to show that models trained with current state-of-the-art methods are highly fragile to variability encountered in clinical practice, and then develop a $\textit{strong augmentation}$ strategy to address this fragility. Distribution-shifted datasets allow us to discover this fragility, which can otherwise remain undetected after validation against multiple external datasets. Strong augmentation allows us to train robust models achieving consistent performance under shifts from the training data distribution. Importantly, we demonstrate that strong augmentation yields biomedical imaging models which retain high performance when applied to real-world clinical data. Our results pave the way for the development and evaluation of reliable and robust neural networks in clinical practice.
翻译:深神经网络在广泛的医学成像任务中取得了令人印象深刻的成绩。 但是,这些模型往往无法使用培训期间没有使用的数据,例如来自不同医疗中心的数据。 如何识别受这种脆弱性影响的模型,以及如何设计稳健模型是临床采用的主要障碍。 在这里,我们提出一些一般方法来查明模型概括失败的原因以及如何绕过这些失败。 首先,我们使用$\textit{missmission-dreat-trafted数据集来显示,经过目前最新方法培训的模型在临床实践中遇到的变异性方面非常脆弱,然后制定美元/textit{strongEmpreability}战略来应对这种脆弱性。 配发式数据集使我们能够发现这种脆弱性,否则在对多个外部数据集进行验证后,这种脆弱性将无法被察觉。 强大的增强使我们能够训练稳健的模型,在培训数据传播的转变下取得一致的性能。 重要的是,我们证明强增能生成的生物医学成像模型,这些模型在应用现实世界临床数据时仍然保持高性。 我们的结果为临床实践中可靠和坚固的神经网络的发展和评价铺平铺路。