UniNet: 集成、变形和 MLP 统一建筑搜索 (UniNet: Unified Architecture Search with Convolution, Transformer, and MLP)

Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. However, how to effectively combine those operators to form high-performance hybrid visual architectures still remains a challenge. In this work, we study the learnable combination of convolution, transformer, and MLP by proposing a novel unified architecture search approach. Our approach contains two key designs to achieve the search for high-performance networks. First, we model the very different searchable operators in a unified form, and thus enable the operators to be characterized with the same set of configuration parameters. In this way, the overall search space size is significantly reduced, and the total search cost becomes affordable. Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators. Our proposed DSMs are able to better adapt features from different types of operators, which is important for identifying high-performance hybrid architectures. Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators. To this end, we search a baseline network and scale it up to obtain a family of models, named UniNets, which achieve much better accuracy and efficiency than previous ConvNets and Transformers. In particular, our UniNet-B5 achieves 84.9% top-1 accuracy on ImageNet, outperforming EfficientNet-B7 and BoTNet-T7 with 44% and 55% fewer FLOPs respectively. By pretraining on the ImageNet-21K, our UniNet-B6 achieves 87.4%, outperforming Swin-L with 51% fewer FLOPs and 41% fewer parameters. Code is available at https://github.com/Sense-X/UniNet.

翻译：最近,变压器和多层透视器(MLP)架构在各种愿景任务上取得了令人印象深刻的成果。然而,如何有效地结合这些操作员以形成高性能的混合视觉结构仍是一个挑战。在这项工作中,我们通过提出新的统一架构搜索方法,研究调动、变压器和MLP之间可学习的组合。我们的方法包含两种关键设计,以实现对高性能网络的搜索。首先,我们以统一的形式模拟非常不同的可搜索操作员,从而使操作员具有一套相同的配置参数。这样,总体搜索空间的大小就大大缩小,而搜索的总成本也能够负担得起。第二,我们建议通过背景觉悟下调模块(DSMM)来缩小不同类型操作员之间的差距。我们提议的DSMDSM能够更好地调整不同类型操作员的功能,这对于确定高性能混合结构非常重要。最后,我们把可互连接操作员和DSMOVS6的操作员纳入一个统一的搜索空间和搜索空间,以强化学习为主的搜索算法,以充分探索44-Net网络的精确度组合,我们以前的网络和直径直径直径直径网络的网络,最终的网络将达到特定的网络和直径直径直径的网络,我们以前的网络,最终的网络,最终的网络的网络,从而实现网络的网络和直径直径直径直径直径。