Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms. At the end of the search phase, the operations with the largest architecture parameters will be selected to form the final architecture, with the implicit assumption that the values of architecture parameters reflect the operation strength. While much has been discussed about the supernet's optimization, the architecture selection process has received little attention. We provide empirical and theoretical analysis to show that the magnitude of architecture parameters does not necessarily indicate how much the operation contributes to the supernet's performance. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We re-evaluate several differentiable NAS methods with the proposed architecture selection and find that it is able to extract significantly improved architectures from the underlying supernets consistently. Furthermore, we find that several failure modes of DARTS can be greatly alleviated with the proposed selection method, indicating that much of the poor generalization observed in DARTS can be attributed to the failure of magnitude-based architecture selection rather than entirely the optimization of its supernet.
翻译:差异性神经建筑搜索是最受欢迎的神经结构搜索方法之一, 其搜索效率和简单性最受欢迎的神经结构搜索(NAS)方法之一,通过基于梯度的算法, 共同优化模型重量和结构参数, 实现该方法的搜索效率和简单性。 在搜索阶段结束时, 将选择具有最大结构参数的操作来形成最终架构, 隐含地假设架构参数值反映了操作强度。 虽然对超级网络优化问题已经讨论过很多, 但架构选择程序却很少受到关注。 我们提供的经验和理论分析, 以显示架构参数的规模并不一定表明操作对超级网络性能的贡献程度。 我们建议采用另一种基于扰动的架构选择选择, 直接衡量每个操作对超级网络的影响。 我们用拟议的架构选择重新评价了几种不同的NAS方法, 发现它能够从基础的超级网络中获取显著改进的架构。 此外, 我们发现DARTS的几种故障模式可以通过拟议的选择方法大大缓解, 这表明在DARSS选择结构中观察到的较差的一般化而不是完全的顶级结构失败。