We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.
翻译:我们展示了完全基于图像分类多层概念的ResMLP。它是一个简单的剩余网络,可以替代(一)一个图像在跨频道之间独立和完全相同的相交线层,以及(二)一个双层向前传输网络,使每个频道独立互动。当我们通过使用重数据放大和可选提炼的现代培训战略培训它时,它在图像网络上实现了令人惊讶的精准/复杂平衡。我们还在自我监督的设置中培训了ResMLP模型,以进一步去除使用标签数据集的前身。最后,通过将我们的模型改换成机器翻译,我们取得了令人惊讶的良好成果。我们分享了基于Timm图书馆的预培训模型和代码。