In spite of the recent success of deep learning in the medical domain, the problem of data scarcity in the medical domain gets aggravated due to privacy and data ownership issues. Distributed learning approaches including federated learning have been studied to alleviate the problems, but they suffer from cumbersome communication overheads and weakness in privacy protection. To address this, here we propose a self-supervised masked sampling distillation method for vision transformer that can be performed without continuous communication but still enhance privacy using a vision transformer-specific encryption method. The effectiveness of our method is demonstrated with extensive experiments on two medical domain data and two different downstream tasks, showing superior performances than those obtained with the existing distributed learning strategy as well as the fine-tuning only baseline. As the self-supervised model built with the proposed method is capable of having a general semantic understanding of the modality, we demonstrate its potential as a task-agnostic foundation model for various medical tasks, widening the applicability in the medical domain.
翻译:尽管最近医学领域深层学习取得了成功,但由于隐私和数据所有权问题,医疗领域数据稀缺的问题因隐私和数据所有权问题而更加严重。已经研究了包括联合学习在内的分散学习方法,以缓解问题,但是它们受到繁琐的通信间接费用和隐私保护方面的弱点的影响。为了解决这个问题,我们在这里提议为视觉变压器采用自我监督的蒙面采样法,在没有连续通信的情况下可以进行,但仍然使用视觉变压器特定加密方法提高隐私。我们的方法的有效性表现在对两种医疗领域数据和两种不同的下游任务进行广泛的实验,展示出优于现有分布式学习战略所获得的业绩以及微调基准。由于以拟议方法建立的自我监督模型能够对模式有一个普遍的语义性理解,我们展示了它作为各种医疗任务的任务-认知基础模型的潜力,扩大了医疗领域的适用性。