We present SplitMixer, a simple and lightweight isotropic MLP-like architecture, for visual recognition. It contains two types of interleaving convolutional operations to mix information across spatial locations (spatial mixing) and channels (channel mixing). The first one includes sequentially applying two depthwise 1D kernels, instead of a 2D kernel, to mix spatial information. The second one is splitting the channels into overlapping or non-overlapping segments, with or without shared parameters, and applying our proposed channel mixing approaches or 3D convolution to mix channel information. Depending on design choices, a number of SplitMixer variants can be constructed to balance accuracy, the number of parameters, and speed. We show, both theoretically and experimentally, that SplitMixer performs on par with the state-of-the-art MLP-like models while having a significantly lower number of parameters and FLOPS. For example, without strong data augmentation and optimization, SplitMixer achieves around 94% accuracy on CIFAR-10 with only 0.28M parameters, while ConvMixer achieves the same accuracy with about 0.6M parameters. The well-known MLP-Mixer achieves 85.45% with 17.1M parameters. On CIFAR-100 dataset, SplitMixer achieves around 73% accuracy, on par with ConvMixer, but with about 52% fewer parameters and FLOPS. We hope that our results spark further research towards finding more efficient vision architectures and facilitate the development of MLP-like models. Code is available at https://github.com/aliborji/splitmixer.
翻译:我们提出 SplitMixer, 是一个简单且轻巧的等重异调 MLP 类结构, 供视觉识别。 它包含两种交错的混合组合操作, 以在空间位置( 空间混合) 和频道( 通道混合) 之间混合信息。 第一种是按顺序应用两个深度 1D 内核, 而不是 2D 内核, 以混合空间信息。 第二个是将频道分为重叠或非重叠部分, 有或没有共享参数, 并应用我们提议的频道混合参数或 3D 组合来混合频道信息 。 根据设计选择, 可以构建一些 SplitMix 变量, 以平衡准确性、 参数数量和速度。 我们从理论上和实验上显示, SplitMixer 与目前最先进的 MLPIFLP 模型相同, 而没有强大的数据放大和优化, SlipplitM 10 的精度约为94%, 只有0. 28M 参数, 而ConMM- mlix 的精确性M, 则在M- mix 上实现相同的精度。