Focus Former:通过建筑采样器重点关注我们需要什么 (FocusFormer: Focusing on What We Need via Architecture Sampler)

Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5% in terms of the Top-1 accuracy.

翻译：视觉变异器(Viet Generals) 支撑了计算机愿景的最新突破。然而, 设计 ViTs 的架构非常困难, 严重依赖专家知识。要将设计过程自动化, 并纳入部署灵活性, 一次性的神经结构搜索将超级网络的培训和结构专业化破解到不同的部署情景中。要应对超级网中数量众多的子网络, 现有方法在培训期间的每个更新步骤中处理所有架构都同等重要和随机抽样。在架构搜索中, 这些方法侧重于在业绩和资源消耗的帕雷托前沿找到架构, 这在培训和部署之间形成了差距。在本文中, 我们设计了一个简单而有效的方法, 叫做 Focus Former Former, 以弥合这种差距。为此, 我们建议学习一个架构取样器, 在超级网络培训期间, 在不同的资源制约下, 对所有架构进行更优化, 从而改进它们的性能。在专业化期间, 我们可以直接使用经过良好训练的架构取样器来获取精确的架构结构, 满足给FIFFFormer Foreral 的图像限制, 在大幅的搜索中, 改进了我们FIFIFor For For For Forim Forest 的测试中, prest press ex imal ex ex ex ex ex ex ex ex a ex ex ex ex ex ex ex ex ex ex a ex ex ex extra ex ex extra ex extra extraceal extra ex ex ex ex ex ex lax lax lax lax lax lax lax lax lax lax ex ex ex pal ex pal ex pal ex ex ex ex pal ex pal sal sal sal sal sal exfal sal sal sal sal sal sal sal sal sal pal ex