Architecture performance predictors have been widely used in neural architecture search (NAS). Although they are shown to be simple and effective, the optimization objectives in previous arts (e.g., precise accuracy estimation or perfect ranking of all architectures in the space) did not capture the ranking nature of NAS. In addition, a large number of ground-truth architecture-accuracy pairs are usually required to build a reliable predictor, making the process too computationally expensive. To overcome these, in this paper, we look at NAS from a novel point of view and introduce Learning to Rank (LTR) methods to select the best (ace) architectures from a space. Specifically, we propose to use Normalized Discounted Cumulative Gain (NDCG) as the target metric and LambdaRank as the training algorithm. We also propose to leverage weak supervision from weight sharing by pretraining architecture representation on weak labels obtained from the super-net and then finetuning the ranking model using a small number of architectures trained from scratch. Extensive experiments on NAS benchmarks and large-scale search spaces demonstrate that our approach outperforms SOTA with a significantly reduced search cost.
翻译:建筑性能预测器被广泛用于神经结构搜索(NAS ) 。 虽然这些预测器被证明是简单而有效的,但以往艺术中的优化目标(如空间中所有建筑的精确估计或完美排序)并没有反映NAS的等级性质。 此外,通常需要大量的地面真实建筑-准确性对子来建立一个可靠的预测器,使这一过程在计算上过于昂贵。为了克服这些障碍,我们从新的角度看待NAS,并引入从空间中选择最佳(ace)建筑的学习方法。具体地说,我们提议使用标准化的累计增益(NDCG)作为目标指标,而 LambdaRank作为培训算法。我们还建议利用从培训前结构中共享的对从超级网络获得的薄弱标签的重量的薄弱监管,然后利用从零开始训练的少量结构对排名模型进行微调。关于NAS基准和大规模搜索空间的大规模实验表明,我们的方法超过了SOTA。