We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. We propose a structural re-parameterization technique that adds local prior into an FC to make it powerful for image recognition. Specifically, we construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. On CIFAR, a simple pure-MLP model shows performance very close to CNN. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs. Our intriguing findings highlight that combining the global representational capacity and positional perception of FC with the local prior of convolution can improve the performance of neural network with faster speed on both the tasks with translation invariance (e.g., semantic segmentation) and those with aligned images and positional patterns (e.g., face recognition). The code and models are available at https://github.com/DingXiaoH/RepMLP.
翻译:我们提议了RepMLP, 是一个多层立方体风格的神经网络构件, 用于图像识别, 由一系列完全连接( FC) 的层组成。 与进化层相比, FC 层效率更高, 更能模拟长距离依赖性和定位模式, 但是在捕捉本地结构时更差, 因此通常不太有利于图像识别。 我们提议了结构再校准技术, 在图像识别之前将本地添加到 FC 。 具体地说, 我们在 RepMLP 中建立同流层, 并将其合并到 FC 中。 在 CIRA 中, 一个简单的纯- MLP 模型显示非常接近CNN。 通过在传统CNN 中插入 RepLP, 我们将ResNet 改进了1. 8% 的精度, 面部识别率为2. 29 %, 在FLOP 较低的城市景色上增加2.3% mIOU。 我们令人感兴趣的发现, 将FC 的全球代表能力和定位与地方变动前的变形 D 位置结合起来。 在变形/ 和变形中, 变形中, 将这些变形网络和变形图解中可以改进运行和变形图段( ) 和变形图解中, 和变形图解中, 和变形图解中, 等和变形图段段段的图像和图段) 。