FLASH: 硬硬件优化快速神经结构搜索 (FLASH: Fast Neural Architecture Search with Hardware Optimization)

Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.

翻译：神经架构搜索(NAS)是设计高效和高性能深神经网络的有希望的技术。随着ML应用程序的性能要求不断增长,硬件加速器开始在 DNN设计中扮演中心角色。这一趋势使得NAS更加复杂,对大多数真实应用程序来说耗时。本文提出FLASAS,这是非常快速的NAS方法,在真正的硬件平台上将DNN的精确度和性能优化到一个非常快速的硬件平台上。作为主要理论贡献,我们首先提出NNN-Degree(NNN-Degree),这是一个用于量化具有跳过连接的 DNNNC应用程序的地形特征的分析指标(例如DenseNets,ResNets,宽度ResNets,和移动Nets。)新提议的NNNS-Degrereet允许我们在一秒内完成无培训NAS,并建立一个精确的预测器,通过培训,在拥有630亿以上配置的宽度搜索空间上只有25个样本。第二,通过对目标硬件进行推算,我们微调的DS-D和验证我们的分析模型,我们的分析模型,用来评估模型来评估我们用来评估。在进行全球水平级智能智能结构中,在运行中,而我们运行中,Squcrodealal-dealal-deal-deal-deal-deal-deal-deal-coal-coal-de la-de 的系统内部的系统内部的系统结构结构结构结构结构结构结构结构结构结构中,在显示一个不同的数据结构上显示一个不同的计算。