Neuromorphic Systems-on-Chip (NSoCs) are becoming heterogeneous by integrating general-purpose processors (GPPs) and neural processing units (NPUs) on the same SoC. For embedded systems, an NSoC may need to execute user applications built using a variety of machine learning models. We propose a real-time scheduler, called PRISM, which can schedule machine learning models on a heterogeneous NSoC either individually or concurrently to improve their system performance. PRISM consists of the following four key steps. First, it constructs an interprocessor communication (IPC) graph of a machine learning model from a mapping and a self-timed schedule. Second, it creates a transaction order for the communication actors and embeds this order into the IPC graph. Third, it schedules the graph on an NSoC by overlapping communication with the computation. Finally, it uses a Hill Climbing heuristic to explore the design space of mapping operations on GPPs and NPUs to improve the performance. Unlike existing schedulers which use only the NPUs of an NSoC, PRISM improves performance by enabling batch, pipeline, and operation parallelism via exploiting a platform's heterogeneity. For use-cases with concurrent applications, PRISM uses a heuristic resource sharing strategy and a non-preemptive scheduling to reduce the expected wait time before concurrent operations can be scheduled on contending resources. Our extensive evaluations with 20 machine learning workloads show that PRISM significantly improves the performance per watt for both individual applications and use-cases when compared to state-of-the-art schedulers.
翻译:通过将通用处理器(GPPs)和神经处理器(NPUs)整合在同一 SoC上,神经系统(NSoCs)正在变得五花八门。对于嵌入系统,NSOC可能需要执行使用各种机器学习模型建立的用户应用程序。我们提议了一个实时调度器,称为PRISM,它可以单独或同时将机器学习模型排在混杂的NSOC上,以改进它们的系统性能。PRISM由以下四个关键步骤组成。首先,它从绘图和自定时间应用中,构建了一个机器学习模型的跨处理器(IPC)图。第二,它为通信行为者创建了一个交易订单,并将这一订单嵌入IPC的图中。第三,它通过与计算系统重叠,将图表排在NSOSC上。最后,它用一个山坡攀爬式的心思来探索GPPP和NPUPS系统(NPERs)上的设计空间来改进业绩。与现有的调度器不同,它只使用NPUS、IPSM(IPS)的机械学习模型模型, 改进了个人时间模型的运行的运行的运行, 从而通过分批量、预估定时, 学习时间平台, 改进了超时, 改进了超时的运行的运行的运行的运行的运行的运行的运行的运行的进度, 改进了一个运行的进度。