MMALE: 弹性估计的微处理器A先令 (MAPLE: Microprocessor A Priori for Latency Estimation)

Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. As such, neural architecture search (NAS) algorithms take these two constraints into account when generating a new architecture. However, efficiency metrics such as latency are typically hardware dependent requiring the NAS algorithm to either measure or predict the architecture latency. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. Here we propose Microprocessor A Priori for Latency Estimation MAPLE that does not rely on transfer learning or domain adaptation but instead generalizes to new hardware by incorporating a prior hardware characteristics during training. MAPLE takes advantage of a novel quantitative strategy to characterize the underlying microprocessor by measuring relevant hardware performance metrics, yielding a fine-grained and expressive hardware descriptor. Moreover, the proposed MAPLE benefits from the tightly coupled I/O between the CPU and GPU and their dependency to predict DNN latency on GPUs while measuring microprocessor performance hardware counters from the CPU feeding the GPU hardware. Through this quantitative strategy as the hardware descriptor, MAPLE can generalize to new hardware via a few shot adaptation strategy where with as few as 3 samples it exhibits a 3% improvement over state-of-the-art methods requiring as much as 10 samples. Experimental results showed that, increasing the few shot adaptation samples to 10 improves the accuracy significantly over the state-of-the-art methods by 12%. Furthermore, it was demonstrated that MAPLE exhibiting 8-10% better accuracy, on average, compared to relevant baselines at any number of adaptation samples.

翻译：现代深层神经网络必须显示最先进的精密性,同时显示低纬度和能源消耗。因此, 神经结构搜索(NAS) 算法在创建新架构时将这两个限制考虑在内。但是, 等等效率指标通常取决于硬件, 需要NAS 算法测量或预测架构延度。测量每个被评估的架构的宽度为NAS进程增加了大量时间。我们在这里提议微处理器A A A A Pritidici for Latenity Estimation MAPLE, 它不依赖于传输学习或域域域性适应,而是通过在培训中引入一个前8个硬件特性来概括新硬件的精度。 MAPLE 利用一种新的定量战略来描述基本微处理程序的特点, 通过测量相关硬件的精度和显示微量显示显示精度的精度, CPU和GPOI 的精度将微度性能硬件与微量的精度比, 将微量级LEOLE 3 转化为GPOL 10 战略, 显示微度的精度调整方法, 将它作为普通的精度的精度比的精度, 的精度方法, 将微的精度推的精度比。