(e) 自动智能硬件高效部署基准设定和探索加速器 (Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment)

from arxiv, Published at MLSys'21 Workshop on Benchmarking Machine Learning Workloads on Emerging Hardware. arXiv admin note: text overlap with arXiv:2008.12745

Customized hardware accelerators have been developed to provide improved performance and efficiency for DNN inference and training. However, the existing hardware accelerators may not always be suitable for handling various DNN models as their architecture paradigms and configuration tradeoffs are highly application-specific. It is important to benchmark the accelerator candidates in the earliest stage to gather comprehensive performance metrics and locate the potential bottlenecks. Further demands also emerge after benchmarking, which require adequate solutions to address the bottlenecks and improve the current designs for targeted workloads. To achieve these goals, in this paper, we leverage an automation tool called DNNExplorer for benchmarking customized DNN hardware accelerators and exploring novel accelerator designs with improved performance and efficiency. Key features include (1) direct support to popular machine learning frameworks for DNN workload analysis and accurate analytical models for fast accelerator benchmarking; (2) a novel accelerator design paradigm with high-dimensional design space support and fine-grained adjustability to overcome the existing design drawbacks; and (3) a design space exploration (DSE) engine to generate optimized accelerators by considering targeted AI workloads and available hardware resources. Results show that accelerators adopting the proposed novel paradigm can deliver up to 4.2X higher throughput (GOP/s) than the state-of-the-art pipeline design in DNNBuilder and up to 2.0X improved efficiency than the recently published generic design in HybridDNN given the same DNN model and resource budgets. With DNNExplorer's benchmarking and exploration features, we can be ahead at building and optimizing customized AI accelerators and enable more efficient AI applications.

翻译：开发了自定义的硬件加速器,以便为DNN的测试和培训提供更好的性能和效率;然而,现有的硬件加速器可能并不总是适合处理各种DNN模型,因为其架构范式和配置权衡非常适合应用;重要的是在最早的阶段为加速器候选人设定基准,以收集全面的性能衡量标准,并找到潜在的瓶颈;在基准制定之后还会出现进一步的需求,这需要解决瓶颈问题的适当解决方案,并改进当前对目标工作量的设计设计;为了实现这些目标,我们在本文件中利用一个名为DNNExtrator的自动化工具,以对定制的DNNNE硬件加速器进行基准化,并探索具有更高性能和效率的新型加速器设计;关键特征包括:(1) 直接支持为DNNN的工作量分析和精确分析模型的流行机器学习框架,以及快速加速器基准的精确分析模型;(2) 具有高度设计空间支持和精细度的加速器设计,以克服现有的设计提取;(3) 设计空间探索DNNE(DE) 引擎,以优化的标准化设计设计设计设计设计设计设计系统设计设计设计设计设计设计,通过考虑目标型的AILEDRDRD(D) 交付的硬件和新的硬体,以展示新的硬体,从而展示新的硬体,以更新现有的的硬体,在新的硬体中展示的硬件。