Weight-sharing neural architecture search (NAS) is an effective technique for automating efficient neural architecture design. Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks. The success of weight-sharing NAS heavily relies on distilling the knowledge of the supernet to the sub-networks. However, we find that the widely used distillation divergence, i.e., KL divergence, may lead to student sub-networks that over-estimate or under-estimate the uncertainty of the teacher supernet, leading to inferior performance of the sub-networks. In this work, we propose to improve the supernet training with a more generalized alpha-divergence. By adaptively selecting the alpha-divergence, we simultaneously prevent the over-estimation or under-estimation of the uncertainty of the teacher model. We apply the proposed alpha-divergence based supernet training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes, including BigNAS, Once-for-All networks, FBNetV3, and AttentiveNAS. We achieve ImageNet top-1 accuracy of 80.0% with only 444 MFLOPs.
翻译:共享重力的神经结构搜索(NAS)是使高效神经结构设计自动化的有效技术。 共享重力的NAS建立了一个将所有结构集合成其子网络的超级网, 并与子网络联合培训超级网。 共享重力的NAS成功与否在很大程度上依赖于将超级网知识蒸馏到子网络。 然而, 我们发现广泛使用的蒸馏差异, 即 KL 差异, 可能导致学生子网, 高估或低估教师超级网的不确定性, 导致子网络的性能低下。 在这项工作中, 我们提议改进超级网培训, 使其具有更普遍的字母调高调。 通过适应性选择阿尔法- 调, 我们同时防止过度估计或低估教师模型的不确定性。 我们只将拟议的基于阿尔法- 1- 1% 的超网络培训应用到可隐性神经网络和加权共享的NAS, 并展示了显著的改进。 具体地说, 我们所发现的FO型模型, AL-LS 3 和 FA- FA- FA- FA-F- FA- FA-F-F-F-M- FRA- FRA- smal 的宽模型, 4 范围 4 模型, 范围, 4- FAC- FAC- s- smal- smal- smal- smal- smmal- sal- smal- smal- smmm- sal- fal- frommmal- smal- sal- fromal- smalmalmalmmmal 4 4 3, am- smal- far- smal- smal- smal- smal- far- smal- fal- fal- 4 4 4 4 4 4 4 am ammal- smmmmal- smmmmmal- smal- smal- smal- smmmmmmmal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal- fal-