Neural Architecture Search (NAS) has been widely adopted to design accurate and efficient image classification models. However, applying NAS to a new computer vision task still requires a huge amount of effort. This is because 1) previous NAS research has been over-prioritized on image classification while largely ignoring other tasks; 2) many NAS works focus on optimizing task-specific components that cannot be favorably transferred to other tasks; and 3) existing NAS methods are typically designed to be "proxyless" and require significant effort to be integrated with each new task's training pipelines. To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort. Specifically, we design 1) a search space that is simple yet inclusive and transferable; 2) a multitask search process that is disentangled with target tasks' training pipeline; and 3) an algorithm to simultaneously search for architectures for multiple tasks with a computational cost agnostic to the number of tasks. We evaluate the proposed FBNetV5 targeting three fundamental vision tasks -- image classification, object detection, and semantic segmentation. Models searched by FBNetV5 in a single run of search have outperformed the previous stateof-the-art in all the three tasks: image classification (e.g., +1.3% ImageNet top-1 accuracy under the same FLOPs as compared to FBNetV3), semantic segmentation (e.g., +1.8% higher ADE20K val. mIoU than SegFormer with 3.6x fewer FLOPs), and object detection (e.g., +1.1% COCO val. mAP with 1.2x fewer FLOPs as compared to YOLOX).
翻译:为设计准确而高效的图像分类模型,广泛采用了神经架构搜索(NAS) 。然而,将NAS应用到新的计算机愿景任务仍需要大量努力。 这是因为:(1) 先前的NAS研究在图像分类上被过度优先排序,而基本上忽略了其他任务;(2) 许多NAS的工作重点是优化无法顺利转移到其他任务的任务特定组件;(3) 现有的NAS方法通常设计为“无交错”,需要大量努力与每个新任务培训管道整合。为了应对这些挑战,我们建议 FBNetV5 建立一个NAS 框架,这个框架可以搜索各种视觉任务神经结构,而计算成本和人力工作则大大降低。具体地说,我们设计了一个简单但具有包容性和可转让的搜索空间;(2) 多任务搜索进程与目标任务培训管道不相干;以及(3) 一种同时搜索多个任务结构的算法, 以较低成本计算, 20 与任务数量相比。 我们评估了拟议的 FBNetV5 3 基本目标目标结构结构结构, 以FSecial- searrial 5 进行图像分类, 在 Fstal-stal Serval 5 上进行所有 F- searchalf-stal 5 的图像分类, 。