With the development of deep convolutional neural networks, medical image segmentation has achieved a series of breakthroughs in recent years. However, the high-performance convolutional neural networks always mean numerous parameters and high computation costs, which will hinder the applications in clinical scenarios. Meanwhile, the scarceness of large-scale annotated medical image datasets further impedes the application of high-performance networks. To tackle these problems, we propose Graph Flow, a comprehensive knowledge distillation framework, for both network-efficiency and annotation-efficiency medical image segmentation. Specifically, our core Graph Flow Distillation transfer the essence of cross-layer variations from a well-trained cumbersome teacher network to a non-trained compact student network. In addition, an unsupervised Paraphraser Module is integrated to purify the knowledge of the teacher network, which is also beneficial for the stabilization of training procedure. Furthermore, we build a unified distillation framework by integrating the adversarial distillation and the vanilla logits distillation, which can further refine the final predictions of the compact network. With different teacher networks (conventional convolutional architecture or prevalent transformer architecture) and student networks, we conduct extensive experiments on four medical image datasets with different modalities (Gastric Cancer, Synapse, BUSI, and CVC-ClinicDB).We demonstrate the prominent ability of our method which achieves competitive performance on these datasets. Moreover, we demonstrate the effectiveness of our Graph Flow through a novel semi-supervised paradigm for dual efficient medical image segmentation. Our code will be available at Graph Flow.
翻译:随着深层神经神经网络的发展,医学图像分割近年来取得了一系列突破;然而,高性能神经神经网络的高性能演进总是意味着众多参数和高计算成本,这将阻碍临床情景的应用;与此同时,大规模附加说明的医疗图像数据集的稀缺性进一步阻碍了高性能网络的应用;为了解决这些问题,我们提议Gape Flow,这是一个综合知识蒸馏框架,用于网络效率和注解效率医学图像分割。具体地说,我们的核心图表流蒸馏将跨层次变异的本质从训练有素的累赘性教师网络转移到一个未经培训的小型学生网络。此外,一个未经监督的Paraslater模块被整合,用于净化教师网络的知识,这也有利于培训程序的稳定。此外,我们通过将对抗性蒸馏和香草逻辑淡化结合起来,从而可以进一步改进对精细化网络的最后预测。在不同的教师网络(常规革命范级结构或普通的半性能结构)上,我们通过四大范围的数据流流分析,通过我们现有的C级结构进行我们的数据演示。