Recently, vision transformers and MLP-based models have been developed in order to address some of the prevalent weaknesses in convolutional neural networks. Due to the novelty of transformers being used in this domain along with the self-attention mechanism, it remains unclear to what degree these architectures are robust to corruptions. Despite some works proposing that data augmentation remains essential for a model to be robust against corruptions, we propose to explore the impact that the architecture has on corruption robustness. We find that vision transformer architectures are inherently more robust to corruptions than the ResNet-50 and MLP-Mixers. We also find that vision transformers with 5 times fewer parameters than a ResNet-50 have more shape bias. Our code is available to reproduce.
翻译:最近,已经开发了基于视觉变压器和基于视觉变压器的模型,以解决神经神经网络中一些普遍存在的弱点。由于在这一领域使用变压器以及自省机制的新颖之处,仍然不清楚这些结构对腐败的强大程度。尽管有些工作表明数据扩充对于一个模型的强大反腐败仍然至关重要,但我们提议探索该结构对腐败稳健性的影响。我们发现,与ResNet-50和MLP-Mixers相比,视觉变压器结构对腐败的内在活力更大。我们还发现,与ResNet-50相比,参数少5倍于ResNet-50的变压器具有更大的形状偏向性。我们的代码可以复制。