利用未注册的多视力乳房X光片改善乳腺癌诊断 (Transformers Improve Breast Cancer Diagnosis from Unregistered Multi-View Mammograms)

Deep convolutional neural networks (CNNs) have been widely used in various medical imaging tasks. However, due to the intrinsic locality of convolution operation, CNNs generally cannot model long-range dependencies well, which are important for accurately identifying or mapping corresponding breast lesion features computed from unregistered multiple mammograms. This motivates us to leverage the architecture of Multi-view Vision Transformers to capture long-range relationships of multiple mammograms from the same patient in one examination. For this purpose, we employ local Transformer blocks to separately learn patch relationships within four mammograms acquired from two-view (CC/MLO) of two-side (right/left) breasts. The outputs from different views and sides are concatenated and fed into global Transformer blocks, to jointly learn patch relationships between four images representing two different views of the left and right breasts. To evaluate the proposed model, we retrospectively assembled a dataset involving 949 sets of mammograms, which include 470 malignant cases and 479 normal or benign cases. We trained and evaluated the model using a five-fold cross-validation method. Without any arduous preprocessing steps (e.g., optimal window cropping, chest wall or pectoral muscle removal, two-view image registration, etc.), our four-image (two-view-two-side) Transformer-based model achieves case classification performance with an area under ROC curve (AUC = 0.818), which significantly outperforms AUC = 0.784 achieved by the state-of-the-art multi-view CNNs (p = 0.009). It also outperforms two one-view-two-side models that achieve AUC of 0.724 (CC view) and 0.769 (MLO view), respectively. The study demonstrates the potential of using Transformers to develop high-performing computer-aided diagnosis schemes that combine four mammograms.

翻译：深相神经网络(CNNs)被广泛用于各种医疗成像任务,然而,由于演算行动的内在位置,CNN通常不能很好地模拟远程依赖性,这对于准确识别或绘制从未经注册的多个乳房照片中计算出来的相应的乳腺损伤特征非常重要。这促使我们利用多视视觉变异器的架构来捕捉同一病人的多个乳房照片的远程关系。为此,我们使用本地变压器块来分别学习从双向(右/左)双向(C/MLO)乳房两视(C/MLO)获得的四幅乳房XML(包括470个恶性病例和479个正常或良性案例)。不同观点的输出结果被整合到全球变压器区,共同学习代表左乳房和右乳房两种不同观点的四种图像之间的补差关系。为了评价拟议的模型,我们追溯整理了一套包含949套乳房XLA-直线图的数据集,我们用的是4个变现的模型(包括4个重重的变压和479个正常或良性)MMM-M-M-M-我们用五倍交叉交叉对模型对模型的模型的模型的模型的模型,我们训练和评价的模型的模型的模型的模型的模型,在两面-C-直向-两面的变动动动的模型的模型的变换的模型的模型的模型中,并且进行最前的变换的动作的动作的模型,我们的动作的动作的动作的动作,在最前的模型中进行最前的动作的变换式的变换式的动作的动作的动作的动作的动作的动作的动作的动作的动作的动作的动作的动作的动作,在最前的模型中进行最前的动作的变动的动作的动作的动作的变换的动作的动作的动作中进行最。