Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples.


翻译:现代深层神经网络必须显示最先进的精密性,同时显示低纬度和能源消耗。 因此, 神经结构搜索(NAS) 算法在创建新架构时将这两个限制考虑在内。 但是, 等等效率指标通常取决于硬件, 需要NAS 算法测量或预测架构延度。 测量每个被评估的架构的宽度为NAS进程增加了大量时间。 我们在这里提议微处理器A A A A Pritidici for Latenity Estimation MAPLE, 它不依赖于传输学习或域域域性适应,而是通过在培训中引入一个前8个硬件特性来概括新硬件的精度。 MAPLE 利用一种新的定量战略来描述基本微处理程序的特点, 通过测量相关硬件的精度和显示微量显示显示精度的精度, CPU和GPOI 的精度将微度性能硬件与微量的精度比, 将微量级LEOLE 3 转化为GPOL 10 战略, 显示微度的精度调整方法, 将它作为普通的精度的精度比的精度, 的精度方法, 将微的精度推的精度比。

0
下载
关闭预览

相关内容

专知会员服务
28+阅读 · 2021年8月2日
专知会员服务
33+阅读 · 2021年5月12日
专知会员服务
10+阅读 · 2021年3月21日
【Google】梯度下降,48页ppt
专知会员服务
80+阅读 · 2020年12月5日
【DeepMind-NeurIPS 2020】元训练代理实现Bayes-optimal代理
专知会员服务
11+阅读 · 2020年11月1日
量化金融强化学习论文集合
专知
13+阅读 · 2019年12月18日
分布式并行架构Ray介绍
CreateAMind
9+阅读 · 2019年8月9日
基于Prometheus的K8S监控在小米的落地
DBAplus社群
16+阅读 · 2019年7月23日
Transferring Knowledge across Learning Processes
CreateAMind
27+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
0+阅读 · 2022年2月2日
Measure Estimation in the Barycentric Coding Model
Arxiv
0+阅读 · 2022年1月28日
Viewpoint Estimation-Insights & Model
Arxiv
3+阅读 · 2018年7月3日
VIP会员
Top
微信扫码咨询专知VIP会员