With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of operations scattered across the models. Accelerating a few compute-intensive operations using the expert-tuned implementation of primitives does not fully exploit the performance potential of AI hardware. Various efforts are made to compile a full deep neural network (DNN) graph. One of the biggest challenges is to achieve end-to-end compilation by generating expert-level performance code for the dense compute-intensive operations and applying compilation optimization at the scope of DNN computation graph across multiple compute-intensive operations. We present oneDNN Graph Compiler, a tensor compiler that employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high-performance code generation of the deep neural network graph. oneDNN Graph Compiler addresses unique optimization challenges in the deep learning domain, such as low-precision computation, aggressive fusion, optimization for static tensor shapes and memory layout, constant weight optimization, and memory buffer reuse. Experimental results demonstrate up to 2x performance gains over primitives-based optimization for performance-critical DNN computation graph patterns on Intel Xeon Scalable Processors.
翻译:随着深层学习模型和密集计算硬件支持的迅速发展,深层学习工作量特点从计算密集作业的几个热点,发生了巨大的变化,从计算密集作业的几个热点,到分散在各种模型中的各种业务。使用专家调整的原始软件加速一些计算密集作业,并没有充分利用AI硬件的性能潜力。做出了各种努力来汇编一个完全深层神经网络图(DNN)。最大的挑战之一是通过为密集计算密集的计算密集作业生成专家级性能代码,并在多个计算密集作业的 DNN 计算图形范围内应用编译优化。我们展示了一台DNNE图编译器,这是一台采用从编译器优化和专家调整内核内核技术的混合方法,用于高性神经网络图生成的高性代码。一个DNNE图集汇编程序处理深层学习领域独特的优化挑战,如低精度计算、进取性调调调调、优化静压式高压形状和记忆配置、不断重力优化的XNNL图集编程,并使用一种混合混合方法,用于S-NNFS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-CAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-