The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms.
翻译:现实世界问题的复杂性质要求机器学习模型和硬件系统存在差异性。ML模型的异质性来自多传感器感知和多任务学习,即多式多任务学习,导致神经网络层次和计算模式的深度差异。系统差异性来自不同的处理组件,因为它已成为将多个专用加速器整合到一个系统中的普遍方法。因此,出现了一个新问题:多元系统绘图的混合模型(H2H)。虽然以前绘图算法主要侧重于高效计算,但在此工作中,我们指出,必须同时考虑计算和通信,以提高系统效率。我们提议采用新的H2H算法,同时进行计算和通信意识;通过对通信进行略微交换计算,系统总体通缩度和能源消耗可以大大降低。我们工作的优异性是根据MAESTRO模型评估的,显示15-74%的宽度降低率和23%-64%的能源削减率,与现有的计算-主要算法相比。