Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical data sharing. Therefore, this study proposes a novel soft-label dataset distillation method for medical data sharing. Methods: The proposed method distills valid information of medical image data and generates several compressed images with different data distributions for anonymous medical data sharing. Furthermore, our method can extract essential weights of DCNN models to reduce the memory required to save trained models for efficient medical data sharing. Results: The proposed method can compress tens of thousands of images into several soft-label images and reduce the size of a trained model to a few hundredths of its original size. The compressed images obtained after distillation have been visually anonymized; therefore, they do not contain the private information of the patients. Furthermore, we can realize high-detection performance with a small number of compressed images. Conclusions: The experimental results show that the proposed method can improve the efficiency and security of medical data sharing.
翻译:医疗数据共享的背景和目标:需要共享医疗数据,以便能够跨机构间交流保健信息流,建立高准确的计算机辅助诊断系统;然而,医疗数据集的庞大规模、保存的深发神经网络(DCNNN)模型的大量记忆量以及病人隐私保护是可能导致医疗数据共享效率低下的问题。因此,本研究报告提出医疗数据共享的新型软标签数据蒸馏方法,用于医疗数据共享。方法:拟议方法提取医学图像数据的有效信息,并生成若干压缩图像,以不同数据分配方式生成若干压缩图像,用于匿名医疗数据共享;此外,我们的方法可以提取大量医疗数据集的庞大数量、医疗数据集的庞大数据元件,以减少保存高效医疗数据共享所需训练有素模型所需的记忆量;结果:拟议方法可将数万张图像压缩成若干软标签图像,并将经过培训的模型的大小减少到原有尺寸的几百倍。通过蒸馏后获得的压缩图像已被直视匿名;因此,拟议方法不包含病人的私人信息。此外,我们还可以通过高效的DCNM模型实现高脱性性表现,同时分享少量的压缩图像。 结论:提议的方法:能够显示小的试验方法,可以显示,并显示安全数据。