Transformers have achieved remarkable success in medical image analysis owing to their powerful capability to use flexible self-attention mechanism. However, due to lacking intrinsic inductive bias in modeling visual structural information, they generally require a large-scale pre-training schedule, limiting the clinical applications over expensive small-scale medical data. To this end, we propose a parameter-efficient transformer to explore intrinsic inductive bias via position information for medical image segmentation. Specifically, we empirically investigate how different position encoding strategies affect the prediction quality of the region of interest (ROI), and observe that ROIs are sensitive to the position encoding strategies. Motivated by this, we present a novel Hybrid Axial-Attention (HAA), a form of position self-attention that can be equipped with spatial pixel-wise information and relative position information as inductive bias. Moreover, we introduce a gating mechanism to alleviate the burden of training schedule, resulting in efficient feature selection over small-scale datasets. Experiments on the BraTS and Covid19 datasets prove the superiority of our method over the baseline and previous works. Internal workflow visualization with interpretability is conducted to better validate our success.
翻译:然而,由于在模拟视觉结构信息方面缺乏内在的感应偏差,它们通常需要大规模的培训前时间表,限制昂贵的小规模医疗数据方面的临床应用;为此,我们提议使用一个具有参数效率的变压器,通过医学图像分离的定位信息,探索内在的感应偏差。具体地说,我们从经验上调查不同的位置编码战略如何影响有兴趣区域的预测质量,并观察到ROI对位置编码战略敏感。受此驱动,我们提出了一种新的惯性-感应混合(HAAA),这是一种可配备空间像素信息、相对位置信息的自我注意形式,作为感应偏差。此外,我们引入了一种减轻培训时间表负担的定位机制,从而在小型数据集中有效地选择特征。关于BRATS和Covid19数据集的实验证明了我们的方法优于基线和以前的工作。内部工作流程直观化与解释成功性得到了更好的验证。