Neural Architecture Search (NAS) has enabled the possibility of automated machine learning by streamlining the manual development of deep neural network architectures defining a search space, search strategy, and performance estimation strategy. To solve the need for multi-platform deployment of Convolutional Neural Network (CNN) models, Once-For-All (OFA) proposed to decouple Training and Search to deliver a one-shot model of sub-networks that are constrained to various accuracy-latency tradeoffs. We find that the performance estimation strategy for OFA's search severely lacks generalizability of different hardware deployment platforms due to single hardware latency lookup tables that require significant amount of time and manual effort to build beforehand. In this work, we demonstrate the framework for building latency predictors for neural network architectures to address the need for heterogeneous hardware support and reduce the overhead of lookup tables altogether. We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space, and GPU-generalization which trains a model on GPU hardware parameters such as Number of Cores, RAM Size, and Memory Bandwidth. With this, we provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to with ProxylessNAS. We also show that the use of these latency predictors match the NAS performance of the lookup table baseline approach if not exceeding it in certain cases.
翻译:通过简化深神经网络结构的手工开发,确定搜索空间、搜索战略和绩效评估战略,实现了自动机器学习的可能性。为了解决多平台部署进化神经网络模型的需要,“一劳永逸”(OFA)提议将培训和搜索脱钩,以交付一个受各种精确时间偏差权衡限制的子网络的一次性模型。我们发现,由于单一硬性内衣外观表需要大量的时间和人工努力才能事先建立,因此对深层神经网络结构结构的手工开发,使机器学习成为自动化机器学习的可能性。在这项工作中,我们展示了为神经网络结构建立弹性预测仪的框架,以满足对混合硬件支持的需要,并完全减少外观表格的间接。我们引入了两种通用性战略,其中包括使用在特定硬件和NAS搜索空间上经过培训的基础模型进行微调,以及GPU化系统对通用模型进行模拟,例如核心内衣内衣内衣内衣内衣内衣内衣内衣内衣内衣内无一等核心内衣内衣内建模,同时进行我们还要的内衣内衣内置式预测。