解决灾难性的遗忘问题,以扩大医疗领域 (Addressing catastrophic forgetting for medical domain expansion)

Sharut Gupta,Praveer Singh,Ken Chang,Liangqiong Qu,Mehak Aggarwal,Nishanth Arun,Ashwin Vaswani,Shruti Raghavan,Vibha Agarwal,Mishka Gidwani,Katharina Hoebel,Jay Patel,Charles Lu,Christopher P. Bridge,Daniel L. Rubin,Jayashree Kalpathy-Cramer

from arxiv, First three authors contributed equally

Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privacy. An alternative approach is to fine-tune the model on subsequent institutions after training on the original institution. Notably, this approach degrades model performance at the original institution, a phenomenon known as catastrophic forgetting. In this paper, we develop an approach to address catastrophic forget-ting based on elastic weight consolidation combined with modulation of batch normalization statistics under two scenarios: first, for expanding the domain from one imaging system's data to another imaging system's, and second, for expanding the domain from a large multi-institutional dataset to another single institution dataset. We show that our approach outperforms several other state-of-the-art approaches and provide theoretical justification for the efficacy of batch normalization modulation. The results of this study are generally applicable to the deployment of any clinical deep learning model which requires domain expansion.

翻译：在现实世界的医疗环境中部署深层次学习模式时,模型的易碎性是一个关键的问题。在一个机构表现高的模型,在其他机构进行测试时,其业绩可能显著下降。从多个机构和再培训汇集数据集可能提供直接的解决办法,但往往不可行,并可能损害病人隐私。另一种办法是在对原始机构进行培训后,对后续机构的模式进行微调。值得注意的是,这种方法降低了原始机构的模式性能,一种被称为灾难性的遗忘现象。在本文中,我们制定了一种方法,以解决灾难性的失忆问题,其基础是弹性重量整合,同时在两种情况下调整批次正常化统计数据:第一,将一个成像系统的数据扩大为另一个成像系统,第二,将范围从一个大型多机构数据集扩大到另一个单一机构数据集。我们表明,我们的方法超越了其他几个状态,并为批次正常化调整的有效性提供了理论依据。本研究的结果一般适用于任何需要扩展的临床深层次学习模式的部署。