FNAS: 不确定性软件快速神经结构搜索 (FNAS: Uncertainty-Aware Fast Neural Architecture Search)

Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources compared with gradient-based approaches, due to the rollout bottleneck -- exhaustive training for each sampled generation on proxy tasks. In this paper, we propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS. It is motivated by the interesting observation that both the architecture and the parameter knowledge can be transferred between different experiments and even different tasks. We first introduce an uncertainty-aware critic (value function) in Proximal Policy Optimization (PPO) to utilize the architecture knowledge in previous experiments, which stabilizes the training process and reduces the searching time by 4 times. Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times. It is the first to introduce block-level weight sharing in RLbased NAS. The block similarity function guarantees a 100% hitting ratio with strict fairness. Besides, we show that a simply designed off-policy correction factor used in "replay buffer" in RL optimization can further reduce half of the searching time. Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by ~10x (e.g. ~256 2x2 TPUv2 x days / 20,000 GPU x hour -> 2,000 GPU x hour for MNAS), and guarantees better performance on various vision tasks.

翻译：强化学习(RL)基于神经结构的神经结构搜索(NAS)通常能保证更好的趋同,但与基于梯度的方法相比,我们首先需要巨大的计算资源(价值功能),以利用以往实验中的架构知识,从而稳定培训进程,并将搜索时间减少4次。在本文件中,我们提议建立一个总体管道,以加快推出过程以及NAS的RL进程。这是由以下有趣的观察推动的:建筑和参数知识可以在不同的实验甚至不同的任务之间转移。我们首先在Proximal政策优化(PPPO)中引入一个具有不确定性的批评(价值功能),以便利用以往实验中的架构知识,从而稳定培训进程,并将搜索时间减少4次。此外,我们提议建立一个建筑知识库,连同一个块相似功能,以利用参数知识,将搜索时间缩短2次。这是第一个在基于RLPOS的NAS中引入区级加权权共享。区块相似功能保证了100%的打击率,且严格公平地。此外,我们还表明,在搜索TRIS-S快速搜索GLSSS(S Streal Stregal Streal Stal Stall Stal Sal Sal Sal Sal Sal Sal)中拟议的S)系统快速搜索2L Sal Streal Stal Sal Salimprpral Spral Spral Spral Spral Sal Sal Sal Sal Sal Sal Spral Spral Sal Sal Sal Sal Sal Sal Spral Spral Spral Spral Spral Spral Spral 中所使用的半任务中,可以进一步削减中设计一个设计一个非政策修正要求的反校平面要求的S 。