This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs). We provide extensive analysis on the difficulty of binarizing vision MLPs and find that previous binarization methods perform poorly due to limited capacity of binary MLPs. In contrast with the traditional CNNs that utilizing convolutional operations with large kernel size, fully-connected (FC) layers in MLPs can be treated as convolutional layers with kernel size $1\times1$. Thus, the representation ability of the FC layers will be limited when being binarized, and places restrictions on the capability of spatial mixing and channel mixing on the intermediate features. To this end, we propose to improve the performance of binary MLP (BiMLP) model by enriching the representation ability of binary FC layers. We design a novel binary block that contains multiple branches to merge a series of outputs from the same stage, and also a universal shortcut connection that encourages the information flow from the previous stage. The downsampling layers are also carefully designed to reduce the computational complexity while maintaining the classification performance. Experimental results on benchmark dataset ImageNet-1k demonstrate the effectiveness of the proposed BiMLP models, which achieve state-of-the-art accuracy compared to prior binary CNNs. The MindSpore code is available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/BiMLP}.
翻译:本文研究设计视觉多层感官器(MLPs)的紧凑二进制结构的问题。 我们广泛分析视觉MLP(MLP)的二进制难度,发现由于二进制 MLP(BILP)的能力有限,以前的二进制方法效果不佳。 与传统的CNN(CNN)不同,后者利用大内核规模的革命性操作,在MLP(FC)中完全连接(FC)层可以被视为具有内核大小为1\times1美元的革命层。 因此, FC层在二进制时,其代表能力将受到限制,并对空间混合和频道在中间功能上的能力设置限制。 为此,我们提议通过提高二进制MLP(BIMP)模式的性能,提高二进制式FC层的代表性。 我们设计了一个新颖的二进制组,包含多个分支,以合并同一阶段的一系列产出,以及鼓励前阶段信息流动的通用捷径连接。 下标层也将经过仔细设计, 以降低计算模型/SIM- IMFMLP- binal- 模型的精确性,同时显示前一进制模型的精确性。