医学图像分析: 革命网络、愿景变异器和 Token 混合器的比较 (Magnification Invariant Medical Image Analysis: A Comparison of Convolutional Networks, Vision Transformers, and Token Mixers)

Convolution Neural Networks (CNNs) are widely used in medical image analysis, but their performance degrade when the magnification of testing images differ from the training images. The inability of CNNs to generalize across magnification scales can result in sub-optimal performance on external datasets. This study aims to evaluate the robustness of various deep learning architectures in the analysis of breast cancer histopathological images with varying magnification scales at training and testing stages. Here we explore and compare the performance of multiple deep learning architectures, including CNN-based ResNet and MobileNet, self-attention-based Vision Transformers and Swin Transformers, and token-mixing models, such as FNet, ConvMixer, MLP-Mixer, and WaveMix. The experiments are conducted using the BreakHis dataset, which contains breast cancer histopathological images at varying magnification levels. We show that performance of WaveMix is invariant to the magnification of training and testing data and can provide stable and good classification accuracy. These evaluations are critical in identifying deep learning architectures that can robustly handle changes in magnification scale, ensuring that scale changes across anatomical structures do not disturb the inference results.

翻译：在医学图像分析中广泛使用神经神经网络(CNN),但当测试图像放大与培训图像不同时,其性能会降低。CNN无法在放大规模上推广,这可能导致外部数据集的超优性性表现。这项研究旨在评估在培训和测试阶段分析乳腺癌肿瘤病理图像时各种深层次学习结构的稳健性,在培训和测试阶段,这些结构具有不同的放大规模。在这里,我们探索并比较多种深层学习结构的性能,包括CNN ReNet和移动网络、以自我关注为基础的视觉变异器和双变异器以及符号混合模型,如FNet、ConvMixer、MLP-Mixer和WaveMix等。实验是使用Breakhis数据集进行的,该数据集包含不同放大等级的乳腺癌肿瘤病理学图像。我们表明,WaveMix的性能与培训和测试数据的放大性能不相容,能够提供稳定和良好的分类准确性。这些评估对于确定深层次的学习结构至关重要,这些结构不能在宏观化结构中强有力地处理大规模变化。