Deep neural networks (DNN) have become significant applications in both cloud-server and edge devices. Meanwhile, the growing number of DNNs on those platforms raises the need to execute multiple DNNs on the same device. This paper proposes a dynamic partitioning algorithm to perform concurrent processing of multiple DNNs on a systolic-array-based accelerator. Sharing an accelerator's storage and processing resources across multiple DNNs increases resource utilization and reduces computation time and energy consumption. To this end, we propose a partitioned weight stationary dataflow with a minor modification in the logic of the processing element. We evaluate the energy consumption and computation time with both heavy and light workloads. Simulation results show a 35% and 62% improvement in energy consumption and 56% and 44% in computation time under heavy and light workloads, respectively, compared with single tenancy.
翻译:深神经网络(DNN)已成为云服务器和边缘设备的重要应用。 同时,这些平台上越来越多的DNN用户增加了,这就增加了在同一设备上执行多个DNN用户的需要。 本文建议采用动态分割算法, 在一个基于循环阵列的加速器上同时处理多个DN用户。 共享多个 DNN用户的加速器存储和处理资源可以增加资源利用率, 减少计算时间和能源消耗。 为此, 我们建议使用一个分隔式重数据固定流, 并略微修改处理元素的逻辑。 我们用重轻工作量来评估能源消耗和计算时间。 模拟结果显示,与单一租用量相比, 能源消耗和计算重、轻工作量分别提高了35%和62%, 计算时间增加了56%和44% 。