Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 6% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples.
翻译:现代深层神经网络必须显示最先进的精密性,同时显示低纬度和能源消耗。 因此, 神经结构搜索算法( NAS) 在创建新架构时将这两个限制考虑在内。 但是, 延度等效率指标通常是需要NAS算法测量或预测架构延度的硬件。 测量每个被评估的架构的延度为NAS进程增加了大量时间。 我们在这里提议微处理器A A A A A A Prititudici for Latenity Estimation MAPLE, 它不依赖于传输学习或域域域性适应,而是通过在培训中引入一个前8个硬件特性,将这两个限制值值纳入新硬件。 但是, NAPLE 利用新的定量战略来描述基本微量处理过程的特性, 通过测量CPU 6 的微处理性能, 将GPOLOB 的精度调整方法比GPOR少得多。 将10 度法化为GPLE 的升级法, 将GPLE 10 的精度方法比GPRO 更精确地展示。