With the success of Neural Architecture Search (NAS), weight sharing, as an approach to speed up architecture performance estimation has received wide attention. Instead of training each architecture separately, weight sharing builds a supernet that assembles all the architectures as its submodels. However, there has been debate over whether the NAS process actually benefits from weight sharing, due to the gap between supernet optimization and the objective of NAS. To further understand the effect of weight sharing on NAS, we conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS. Moreover, we take a step forward to explore the pruning based NAS algorithms. Some of our key findings are summarized as: (i) A well-trained supernet is not necessarily a good architecture-ranking model. (ii) Supernet is good at finding relatively good (top-10%) architectures but struggles to find the best ones (top-1% or less). (iii) The effectiveness of supernet largely depends on the design of search space itself. (iv) Comparing to selecting the best architectures, supernet is more confident in pruning the worst ones. (v) It is easier to find better architectures from an effectively pruned search space with supernet training. We expect the observations and insights obtained in this work would inspire and help better NAS algorithm design.
翻译:随着神经建筑搜索(NAS)的成功,体重共享,作为加快建筑绩效估计的一种方法,我们得到了广泛的关注。体重共享不是单独培训每个建筑,而是建立将所有建筑集合成其子型的超级网。然而,由于超级网络优化与NAS目标之间的差距,对NAS进程是否真正受益于权重共享进行了辩论。为了进一步理解权重共享对NAS的影响,我们对五个搜索空间进行了全面分析,包括NAS-Bench-101、NAS-Bench-201、DARSS-CIFAR10、DARSS-PTB和ProxylesnNAS。此外,我们向前迈出了一步,探索基于所有建筑的运行型NAS算法。我们的一些主要结论总结为:(一) 训练有素的超级网络不一定是一个良好的结构级模型。 (二) 超级网络很好地找到相对好的(最高10%) 架构,但很难找到最佳的(顶级或更低级的) 。 (三) 超级网络的有效性主要取决于以最容易的搜索方式选择空间超级网络的系统结构。 (比较容易地) 。 (Com) 在搜索中找到最容易的搜索的模型中找到最高级的高级的模型。