ZARTS: 神经建筑搜索零顺序优化 (ZARTS: On Zero-order Optimization for Neural Architecture Search)

Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. It introduces trainable architecture parameters to represent the importance of candidate operations and proposes first/second-order approximation to estimate their gradients, making it possible to solve NAS by gradient descent algorithm. However, our in-depth empirical results show that the approximation will often distort the loss landscape, leading to the biased objective to optimize and in turn inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue. Also, we search on the search space of DARTS to compare with peer methods, and our discovered architecture achieves 97.54% accuracy on CIFAR-10 and 75.7% top-1 accuracy on ImageNet, which are state-of-the-art performance.

翻译：不同的建筑搜索(DARTS)因其效率高,一直是NAS流行的一流一手范式。它引入了可培训的架构参数,以代表候选人业务的重要性,并提出了第一/第二阶近似法,以估计其梯度,从而有可能通过梯度下移算法解决NAS。然而,我们深入的实证结果表明,近似法往往扭曲损失地貌,导致对建筑参数进行优化的偏颇目标,进而导致对结构参数进行不准确的梯度估计。这项工作转向零级优化,并提议了一个名为ZARSS的新型NAS计划,以不执行上述近似方法进行搜索。具体地说,引入了三种具有代表性的零级优化方法:RS、MS和GLD,其中MS通过平衡精度和速度来进行最佳表现。此外,我们探索RS/MS和梯度下游算法之间的联系,显示我们的ZARS可被视为一个强大的无梯度相对DARS的无梯度估计。关于多个数据集和搜索空间空间空间空间空间空间空间空间的大规模实验显示了我们的方法的显著表现。特别是12项基准核查ZARS的突出的稳性,在DARS搜索中,我们所发现的图像结构上,在DARDARS的最高结构的搜索中,在D-10号结构上也实现了。