The time and effort involved in hand-designing deep neural networks is immense. This has prompted the development of Neural Architecture Search (NAS) techniques to automate this design. However, NAS algorithms tend to be slow and expensive; they need to train vast numbers of candidate networks to inform the search process. This could be alleviated if we could partially predict a network's trained accuracy from its initial state. In this work, we examine the overlap of activations between datapoints in untrained networks and motivate how this can give a measure which is usefully indicative of a network's trained performance. We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU, and verify its effectiveness on NAS-Bench-101, NAS-Bench-201, and Network Design Spaces. Finally, our approach can be readily combined with more expensive search methods; we examine a simple adaptation of regularised evolutionary search that outperforms its predecessor. Code for reproducing our experiments is available at https://github.com/BayesWatch/nas-without-training.
翻译:手设计深神经网络所需的时间和努力是巨大的。 这促使了神经结构搜索技术的开发,使这种设计自动化。 但是,NAS算法往往缓慢而昂贵;它们需要培训大量候选网络来为搜索过程提供信息。 如果我们能从最初状态部分预测一个网络经过训练的准确性, 这可能减轻这种变化。 在这项工作中, 我们检查未经训练的网络中数据点的激活重叠, 并激励这如何提供一种能有益地显示网络经过训练的绩效的措施。 我们将这一措施纳入一个简单的算法中, 使我们能够在一个GPU上不经过任何几秒钟的培训就搜索强大的网络, 并核查其在NAS- Bench-101、NAS- Bench-201和网络设计空间上的有效性。 最后, 我们的方法可以很容易地与更昂贵的搜索方法结合起来; 我们检查一个简单的、 超越其前身的正规化进化搜索方法的适应性能。 我们复制实验的代码可以在 https://github.com/Bayes/nas- nonrestraintraining.