Neural architecture search (NAS) aims to produce the optimal sparse solution from a high-dimensional space spanned by all candidate connections. Current gradient-based NAS methods commonly ignore the constraint of sparsity in the search phase, but project the optimized solution onto a sparse one by post-processing. As a result, the dense super-net for search is inefficient to train and has a gap with the projected architecture for evaluation. In this paper, we formulate neural architecture search as a sparse coding problem. We perform the differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space, and recover an architecture by solving the sparse coding problem. The differentiable search and architecture recovery are optimized in an alternate manner. By doing so, our network for search at each update satisfies the sparsity constraint and is efficient to train. In order to also eliminate the depth and width gap between the network in search and the target-net in evaluation, we further propose a method to search and evaluate in one stage under the target-net settings. When training finishes, architecture variables are absorbed into network weights. Thus we get the searched architecture and optimized parameters in a single run. In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
翻译:神经结构搜索(NAS) 旨在从所有候选连接的高度空间中产生最佳的稀有解决方案。 目前基于梯度的NAS 方法通常忽视搜索阶段的宽度限制,但通过后处理将优化的解决方案投向稀少的解决方案。 因此,密集的超级搜索网在培训方面效率低下,与预测的评价架构存在差距。 在本文件中,我们将神经结构搜索作为一种稀疏的编码问题进行。我们在一个压缩的低维空间上进行不同的搜索,该空间与原始的稀疏解决方案空间一样,通过解决稀疏的编码问题来恢复一个架构。不同的搜索和架构恢复通常忽视搜索阶段的制约因素,但通过后处理将优化到稀疏的解决方案。因此,我们每次更新的搜索网络网络都满足了宽度限制,而且与预测的评价架构存在差距。为了消除搜索网络与评价目标网络之间的深度和宽度差距,我们进一步提出了一种在目标网络设置下一个阶段进行搜索和评估的方法。 当培训完成后,建筑变量被吸收到网络重量的网络重量时,则以不同的方式优化。 因此,我们每次更新的搜索网络的网络的网络搜索网络系统需要在一个阶段进行一次搜索和最佳搜索。