GPU 的多数值 DNN 推推自动运行时间- Aware 排程 (Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU)

With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles. Such multi-tenant DNN inference cases greatly exacerbate the computational complexity and call for comprehensive collaboration for graph-level operator scheduling, runtime-level resource awareness, as well as hardware scheduler support. However, the current scheduling support for such multi-tenant inference is still relatively backward. In this work, we propose a resource-aware scheduling framework for efficient multi-tenant DNN inference on GPU, which automatically coordinates DNN computing in different execution levels. Leveraging the unified scheduling intermediate representation and the automated ML-based searching algorithm, optimal schedules could be generated to wisely adjust model concurrency and interleave DNN model operators, maintaining a continuously balanced resource utilization across the entire inference process, and eventually improving the runtime efficiency. Experiments show that we could consistently achieve 1.3-1.7x speed-up, compared to regular DNN runtime libraries (e.g., CuDNN, TVM) and particular concurrent scheduling methods (e.g., NVIDIA Multi-Stream).

翻译：随着深层神经网络(DNNs)的快速发展,许多现实世界应用程序正在采用多种模型来开展复合任务,如在自主车辆上共同运行的分类、检测和分离模型。这类多租租户 DNN 推断案例极大地加剧了计算复杂性,要求为图形操作员的时间安排、运行时间层面的资源意识以及硬件调度器支持进行全面合作。然而,目前对这种多租期推论的时间安排支持仍然相对落后。在这项工作中,我们建议为高效的多租期 DNNN 在GPU上进行多租期多租期推论制定资源列表框架,该框架自动协调DNN在不同执行级别上的计算。利用统一的调度中间代表制和基于 ML 自动搜索算法,可以生成最佳时间表,以明智地调整模型的调制通、间断 DNNNN的模型操作员,在整个推算过程中保持持续平衡的资源利用,并最终提高运行时间效率。实验表明,与正常的 DNNNT运行时图书馆(e.g.DNNNN、TVM-M)相比,我们可以持续实现1.3-1.7x速度增速。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【MIT韩松博士-ICLR2020】端上自动机器学习-一劳永逸网络的NAS: Once-for-All Network

专知会员服务

58+阅读 · 2020年5月4日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【AAAI 2019 Tutorial】城市交通控制的规划与调度方法（Planning and Scheduling Approaches for Urban Traffic Control），Scott Sanner，Mauro Vallati，Stephen F. Smith

专知会员服务

8+阅读 · 2019年11月18日