Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding. However, existing methods seldom study the various importance of reconstructed patches and the symmetry of anatomical structures, when they are applied to 3D medical images. In this paper we propose a novel Attentive Symmetric Auto-encoder (ASA) based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. We conjecture that forcing the auto-encoder to recover informative image regions can harvest more discriminative representations, than to recover smooth image patches. Then we adopt a gradient based metric to estimate the importance of each image patch. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Moreover, we resort to the prior of brain structures and develop a Symmetric Position Encoding (SPE) method to better exploit the correlations between long-range but spatially symmetric regions to obtain effective features. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models on three brain MRI segmentation benchmarks.
翻译:基于图像补丁重建的自我监督学习方法在培训自动校准器方面取得了巨大成功,这些校准器经过预先训练的重量可以转用于微调其他下游图像理解任务,然而,现有方法很少研究重建补丁和解剖结构对称的不同重要性,当它们应用到3D医学图像时,这些方法很少研究重建补丁和解剖结构的对称性。在本文中,我们提议基于3D脑MRI分解任务的视觉变异器(VT)的新型强化对称自动校准器(ASA),我们推测强迫自动校准器恢复信息化图像区域比恢复平滑的图像补丁具有更多的歧视性。然后我们采用基于梯度的度度度度度度度度度度度度度来估计每个图像补丁的重要性。在培训前阶段,拟议的自动校准器更加关注根据梯度度测量法重建信息补丁的补丁。此外,我们利用大脑结构前的对称定位值定位(SPE)方法来更好地利用长距离但空间校准图像区域之间的关系,而不是恢复光度图像补补补补补丁区。然后,我们提出的自我演制的自我演化三等结构区域,以取得有效的自我校正的自我演化模型。