Neural networks have achieved impressive results in many medical imaging tasks but often perform substantially worse on out-of-distribution datasets originating from different medical centres or patient cohorts. Evaluating this lack of ability to generalise and address the underlying problem are the two main challenges in developing neural networks intended for clinical practice. In this study, we develop a new method for evaluating neural network models' ability to generalise by generating a large number of distribution-shifted datasets, which can be used to thoroughly investigate their robustness to variability encountered in clinical practice. Compared to external validation, \textit{shifted evaluation} can provide explanations for why neural networks fail on a given dataset, thus offering guidance on how to improve model robustness. With shifted evaluation, we demonstrate that neural networks, trained with state-of-the-art methods, are highly fragile to even small distribution shifts from training data, and in some cases lose all discrimination ability. To address this fragility, we develop an augmentation strategy, explicitly designed to increase neural networks' robustness to distribution shifts. \texttt{StrongAugment} is evaluated with large-scale, heterogeneous histopathology data including five training datasets from two tissue types, 274 distribution-shifted datasets and 20 external datasets from four countries. Neural networks trained with \texttt{StrongAugment} retain similar performance on all datasets, even with distribution shifts where networks trained with current state-of-the-art methods lose all discrimination ability. We recommend using strong augmentation and shifted evaluation to train and evaluate all neural networks intended for clinical practice.
翻译:在许多医学成像任务中,神经网络取得了令人印象深刻的成果,但在来自不同医疗中心或病人组群的分布外数据集方面,其效果往往要差得多。评估缺乏概括和解决根本问题的这种能力是发展用于临床实践的神经网络的两大挑战。在本研究中,我们开发了一种新的方法,通过产生大量分布式变换数据集来评价神经网络的概括能力,可以用来彻底调查其稳健性,以了解临床实践中遇到的变异性。与外部验证相比,\ textit{变化评价}可以解释神经网络为何在给定数据集上失败,从而就如何改进模型的稳健性提供指导。随着评估的转移,我们证明,经过最先进方法培训的神经网络非常脆弱,甚至从培训的数据流网络小地转移,在某些情况下,丧失了所有歧视能力。为了应对这一脆弱性,我们制定了增强战略,明确设计了提高神经网络的稳健性向分布变异性。我们用经过培训的当前变现网络来评估。我们用经过培训的数据变换数据网络,从大规模结构变换数据,从经过培训的变换数据到数据流变换数据类型。