使用较弱的预测器,更强的NAS (Stronger NAS with Weaker Predictors)

Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to address such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures due to the difficulty of fitting the huge search space. This paper reflects on a simple yet crucial question: if our final goal is to find the best architecture, do we really need to model the whole space well?. We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors. As a key property of the proposed weak predictors, their probabilities of sampling better architectures keep increasing. Hence we only sample a few well-performed architectures guided by the previously learned predictor and estimate a new better weak predictor. This embarrassingly easy framework produces coarse-to-fine iteration to refine the ranking of sampling space gradually. Extensive experiments demonstrate that our method costs fewer samples to find top-performance architectures on NAS-Bench-101 and NAS-Bench-201, as well as achieves the state-of-the-art ImageNet performance on the NASNet search space. In particular, compared to state-of-the-art (SOTA) predictor-based NAS methods, WeakNAS outperforms all of them with notable margins, e.g., requiring at least 7.5x less samples to find global optimal on NAS-Bench-101; and WeakNAS can also absorb them for further performance boost. We further strike the new SOTA result of 81.3% in the ImageNet MobileNet Search Space. The code is available at https://github.com/VITA-Group/WeakNAS.

翻译：神经架构搜索(NAS) 通常会训练和评估大量建筑。最近预测的NAS 方法试图通过两个关键步骤解决如此高昂的计算成本: 取样一些建筑性能配对, 安装一个代理性准确预测器。但是,由于样本有限, 这些预测器远不能准确定位顶层建筑, 原因是难以安装巨大的搜索空间。本文反映了一个简单而关键的问题 : 如果我们的最终目标是找到最佳的架构, 我们是否真的需要建好整个空间的模型? 我们建议改变模式, 从使用一个强大的预测器来装配整个建筑空间的模型, 以便通过一组较弱的预测器, 逐步将搜索路径安装到高性能子空间。但是, 这些预测器在有限的样本中, 由于难以安装庞大的搜索空间搜索空间, 这些预测器的精确性能远远不够准确。我们只能采集几个由先前所学的预测器指导的完善的架构, 并且估计一个新的更弱的预测器。这个令人尴尬的NAISA-S- SO-S- SO-S- swear- silent rodestryal real comst mastrual- sal- sal- sal- sal set mastry- sal- sal- set madress mastryal- weal- sal- sal- sal- sal- sal- sal- sal- sal- setal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- setmental- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- setal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- set