BaLeNAS:通过贝叶斯学习规则进行差异化建筑搜索 (BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule)

Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost through weight sharing and continuous relaxation. However, more recent works find that existing differentiable NAS techniques struggle to outperform naive baselines, yielding deteriorative architectures as the search proceeds. Rather than directly optimizing the architecture parameters, this paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. By leveraging the natural-gradient variational inference (NGVI), the architecture distribution can be easily optimized based on existing codebases without incurring more memory and computational consumption. We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability. The experimental results on NAS-Bench-201 and NAS-Bench-1shot1 benchmark datasets confirm the significant improvements the proposed framework can make. In addition, instead of simply applying the argmax on the learned parameters, we further leverage the recently-proposed training-free proxies in NAS to select the optimal architecture from a group architectures drawn from the optimized distribution, where we achieve state-of-the-art results on the NAS-Bench-201 and NAS-Bench-1shot1 benchmarks. Our best architecture in the DARTS search space also obtains competitive test errors with 2.37\%, 15.72\%, and 24.2\% on CIFAR-10, CIFAR-100, and ImageNet datasets, respectively.

翻译：近年来,不同的建筑搜索(DARTS)受到大量关注,主要是因为它通过权重共享和持续放松大大降低了计算成本。然而,最近的工作发现,现有不同的NAS技术在超越天性基线的同时,随着搜索的进行,也产生了不良的建筑结构。本文不是直接优化建筑参数,而是通过放松建筑权重,将神经结构搜索作为一种分配学习问题,将其纳入Gaussian的分布中。通过利用自然降级变异推导(NGVI),建筑分布可以很容易地以现有代码库为基础优化,而不会产生更多的记忆和计算消费。我们展示了现有不同的NAS技术如何从Bayesian原则中获得优于天性基线,在搜索过程中产生优异性结构。关于NAS-Bench-201和NAS-Binch-1shout1基准的实验结果证实了拟议框架可以做出的重大改进。此外,我们不仅在学习的参数上应用了调控点,我们还利用了NAS-20最近提出的无培训的准点,而不会产生更多的记忆和计算消费。我们从BES-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S