Neural architecture search (NAS) targets at finding the optimal architecture of a neural network for a problem or a family of problems. Evaluations of neural architectures are very time-consuming. One of the possible ways to mitigate this issue is to use low-fidelity evaluations, namely training on a part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we propose a bayesian multi-fidelity method for neural architecture search: MF-KD. The method relies on a new approach to low-fidelity evaluations of neural architectures by training for a few epochs using a knowledge distillation. Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100, and ImageNet-16-120. We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss. The proposed method outperforms several state-of-the-art baselines.
翻译:神经结构搜索(NAS)的目标是找到神经网络针对问题或问题家庭的最佳结构。神经结构评估非常耗时。缓解这一问题的一个可能的方法是使用低信仰评估,即对数据集进行部分培训,较少的时代,较少的渠道等等。在本文中,我们建议对神经结构搜索采用海湾多信仰方法:MF-KD。该方法依靠一种新的方法,对神经结构进行低信仰评估,通过培训,利用知识蒸馏来培训几个时代的神经结构。知识蒸馏增加了一个“损失”功能,迫使一个网络模拟某些教师网络。我们在CIFAR-10、CIFAR-100和图像网络-16-120上进行了实验。我们表明,对几个具有这种修改的损失功能的教会的培训比对几个有物流损失的时代的培训更能选择神经结构。拟议的方法超越了几个州级基准。