We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We will share our code based on the Timm library and pre-trained models.
翻译:我们介绍了完全建立在多层图像分类概念基础上的ResMLP。这是一个简单的剩余网络,它替代了(一)一个图像在各频道之间独立和完全相同的相交线层,以及(二)一个双层进化前网络,各频道在其中独立互动。当通过使用重数据放大和可选提炼的现代培训战略进行培训时,它将在图像网络上实现令人惊讶的准确性/复杂性权衡。 我们将分享基于Timm图书馆和预培训模型的代码。