Differential Neural Architecture Search (NAS) requires all layer choices to be held in memory simultaneously; this limits the size of both search space and final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as much memory as needed to train a single model. Nevertheless, it needs to sample many architectures, making it computationally expensive for searching in an extensive space. To solve these problems, we propose a sampling method adaptive to the distribution entropy, drawing more samples to encourage explorations at the beginning, and reducing samples as learning proceeds. Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.We call this method Fast Probabilistic NAS (FP-NAS). Compared with PARSEC, it can sample 64% fewer architectures and search 2.1x faster. Compared with FBNetV2, FP-NAS is 1.9x - 3.6x faster, and the searched models outperform FBNetV2 models on ImageNet. FP-NAS allows us to expand the giant FBNetV2 space to be wider (i.e. larger channel choices) and deeper (i.e. more blocks), while adding Split-Attention block and enabling the search over the number of splits. When searching a model of size 0.4G FLOPS, FP-NAS is 132x faster than EfficientNet, and the searched FP-NAS-L0 model outperforms EfficientNet-B0 by 0.6% accuracy. Without using any architecture surrogate or scaling tricks, we directly search large models up to 1.0G FLOPS. Our FP-NAS-L2 model with simple distillation outperforms BigNAS-XL with advanced inplace distillation by 0.7% accuracy with less FLOPS.
翻译:不同的神经结构搜索(NAS) 需要同时保存所有层级选择, 以记忆方式保存所有层次的选择; 这限制了搜索空间和最终架构的大小。 相比之下, 概率性NAS, 如 PARSEC 等, 可以在高性能建筑中学习分布, 并仅使用与培训单一模型所需的同样多的内存。 尽管如此, 它需要抽样许多结构, 使其在广泛空间中搜索的计算成本昂贵。 为了解决这些问题, 我们提议了一种适应分布器的抽样方法, 并绘制了更多的样本, 鼓励在开始时进行探索, 并在学习过程中减少样本。 此外, 为了在多变性空间空间空间空间空间中快速搜索, 我们建议了一种粗度到粗度的内分级分布。 我们称之为快速性NAS( F-NAS) 。 与PARSEC 相比, 它可以将FFFFS 的模型比FS- 快速度模型要少64%, 并且搜索速度更快。 与 FB- ROV2 相比, FPS 是1. 9x 快速的搜索 - 3.x, 更快速, 和 搜索S- silfreadS- falS 将FS- falS- falS- falS- fal- fromod to sild to sild to sild to sild to sild to sild to silated to silated to froisal to froisldaldaldaldal to to to to todald.