Neural architecture search (NAS) has shown encouraging results in automating the architecture design. Recently, DARTS relaxes the search process with a differentiable formulation that leverages weight-sharing and SGD where all candidate operations are trained simultaneously. Our empirical results show that such procedure results in the co-adaption problem and Matthew Effect: operations with fewer parameters would be trained maturely earlier. This causes two problems: firstly, the operations with more parameters may never have the chance to express the desired function since those with less have already done the job; secondly, the system will punish those underperforming operations by lowering their architecture parameter, and they will get smaller loss gradients, which causes the Matthew Effect. In this paper, we systematically study these problems and propose a novel grouped operation dropout algorithm named DropNAS to fix the problems with DARTS. Extensive experiments demonstrate that DropNAS solves the above issues and achieves promising performance. Specifically, DropNAS achieves 2.26% test error on CIFAR-10, 16.39% on CIFAR-100 and 23.4% on ImageNet (with the same training hyperparameters as DARTS for a fair comparison). It is also observed that DropNAS is robust across variants of the DARTS search space. Code is available at https://github.com/wiljohnhong/DropNAS.
翻译:神经结构搜索(NAS) 在建筑设计自动化方面显示了令人鼓舞的结果。 最近, DARTS 放松了搜索过程,以不同的配方使所有候选操作同时接受培训的重量共享和 SGD 。 我们的经验结果表明,这种程序导致共调问题和Matthew 效果:参数较少的操作会更成熟地更早地接受培训。 这造成了两个问题: 首先, 具有更多参数的操作可能永远不会有机会表达预期的功能, 因为那些参数较少的操作已经完成了任务; 其次, 该系统将降低其建筑参数, 从而惩罚那些业绩不佳的操作者, 并将获得较小的损失梯度, 从而导致Matthew 效果。 在本文中, 我们系统地研究这些问题, 并提议一个名为 DOMNAS 的新组合的退出操作算法, 以修复DARSS 的问题。 广泛的实验表明, DRODNAS 解决了上述问题, 并取得了有希望的绩效。 具体地说, NAS 在CIFAR- 10 10 上, 得到2.26%的测试错误, CIFAR- 100 和 23.4% 在图像NA Net Net上( 的训练中, QAR- 39计是相同的高压/ DRARTS ), 的搜索中也显示的搜索/ 。