医学图像分类促进愿景转换器的新视角 (A New Perspective to Boost Vision Transformer for Medical Image Classification)

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

翻译：然而,大多数现有研究都要求将变压器骨干放在一个大型标签的数据集(如图像网)上,以达到令人满意的性能,而医疗图像通常无法使用这些数据。此外,由于医学和自然图像之间的差距,图像网预设的重量的改进显著降低,同时将变压器的重量转换为医疗图像处理任务。在本文件中,我们建议采用变压器的自上而下的Own Latetent (BOLT),这是一种自我监督的学习方法,专门用于变压器骨架的医疗图像分类。我们的变压器由两个网络组成,即在线和目标分支,用于自我监督的演示。具体地说,在线网络受过培训,可以预测同一折叠装符号的目标网络的显示,同时将重量转换器从有限的医疗数据转换为医疗图像处理任务。我们提出一个辅助性难度排序任务。为了确定哪个部门(即在线/目标)正在处理更难的变压的变压式变压机的变压结果。总体而言,变压机的变压式动作本身的变压式的变压过程是不断的自我变压的变压。