In this paper we introduce two algorithms for neural architecture search (NASGD and NASAGD) following the theoretical work by two of the authors [5] which used the geometric structure of optimal transport to introduce the conceptual basis for new notions of traditional and accelerated gradient descent algorithms for the optimization of a function on a semi-discrete space. Our algorithms, which use the network morphism framework introduced in [2] as a baseline, can analyze forty times as many architectures as the hill climbing methods [2, 14] while using the same computational resources and time and achieving comparable levels of accuracy. For example, using NASGD on CIFAR-10, our method designs and trains networks with an error rate of 4.06 in only 12 hours on a single GPU.
翻译:在本文中,我们引入了两种神经结构搜索算法(NASGD 和NASAGD ),这是两位作者的理论工作[5] 之后的两种算法(NASGD 和NASAGD ), 其中两位作者利用最佳运输的几何结构为传统和加速梯度下行算法的新概念引入了概念基础,以优化半分立空间的函数。 我们的算法使用[2]中引入的网络形态框架作为基线,可以分析40倍于山坡攀爬方法[2,14],同时使用相同的计算资源和时间,并达到相似的精确度。 例如,在CIFAR-10上使用NASGD,我们的方法设计和培训网络的误差率仅在12小时内在单一的GPU上为4.06。