Most existing neural architecture search (NAS) benchmarks and algorithms prioritize performance on well-studied tasks, e.g., image classification on CIFAR and ImageNet. This makes the applicability of NAS approaches in more diverse areas inadequately understood. In this paper, we present NAS-Bench-360, a benchmark suite for evaluating state-of-the-art NAS methods for convolutional neural networks (CNNs). To construct it, we curate a collection of ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. By carefully selecting tasks that can both interoperate with modern CNN-based search methods but that are also far-afield from their original development domain, we can use NAS-Bench-360 to investigate the following central question: do existing state-of-the-art NAS methods perform well on diverse tasks? Our experiments show that a modern NAS procedure designed for image classification can indeed find good architectures for tasks with other dimensionalities and learning objectives; however, the same method struggles against more task-specific methods and performs catastrophically poorly on classification in non-vision domains. The case for NAS robustness becomes even more dire in a resource-constrained setting, where a recent NAS method provides little-to-no benefit over much simpler baselines. These results demonstrate the need for a benchmark such as NAS-Bench-360 to help develop NAS approaches that work well on a variety of tasks, a crucial component of a truly robust and automated pipeline. We conclude with a demonstration of the kind of future research our suite of tasks will enable. All data and code is made publicly available.
翻译:大部分现有的神经结构搜索基准和算法将业绩放在研究周全的任务上,例如,CIFAR和图像网络的图像分类。这使得对NAS方法在更多样化领域的适用性认识不足。在本文中,我们提出NAS-Bench-360,这是评估最新NAS神经网络(CNNs)最新技术方法的一套基准套件。为了构建这个套件,我们整理了10项任务,涵盖不同的应用领域、数据集大小、问题维度和学习目标等。通过仔细选择既可以与现代CNFAR的真正搜索方法进行互动,同时又与最初开发领域相距遥远的任务。我们可以使用NAS-Bench-360来调查以下核心问题:现有最先进的NAS系统神经神经网络(CNNS)方法是否在多种任务上运行良好?我们的实验表明,设计成图像分类的现代NAS程序确实可以找到与其他层面和学习目标相匹配的好的任务结构;但是,通过仔细选择更具体的任务方法与更精确的CNNS搜索方法进行斗争,并且执行比它们远离原始开发领域更精确的网络搜索范围的任务,使得NAS的NAS的系统能够很好地进行一个更精确的精确的基线,这样在不精确的NAS上建立一个更精确的系统上,从而在不完善的基线上建立一个更精确的系统上,从而使得一个更精确的 能够更精确地进行一个更精确的精确地进行一个更精确的目录。