A growing number of applications like probabilistic machine learning, sparse linear algebra, robotic navigation, etc., exhibit irregular data flow computation that can be modeled with directed acyclic graphs (DAGs). The irregularity arises from the seemingly random connections of nodes, which makes the DAG structure unsuitable for vectorization on CPU or GPU. Moreover, the nodes usually represent a small number of arithmetic operations that cannot amortize the overhead of launching tasks/kernels for each node, further posing challenges for parallel execution. To enable energy-efficient execution, this work proposes DAG processing unit (DPU) version 2, a specialized processor architecture optimized for irregular DAGs with static connectivity. It consists of a tree-structured datapath for efficient data reuse, a customized banked register file, and interconnects tuned to support irregular register accesses. DPU-v2 is utilized effectively through a targeted compiler that systematically maps operations to the datapath, minimizes register bank conflicts, and avoids pipeline hazards. Finally, a design space exploration identifies the optimal architecture configuration that minimizes the energy-delay product. This hardware-software co-optimization approach results in a speedup of 1.4$\times$, 3.5$\times$, and 14$\times$ over a state-of-the-art DAG processor ASIP, a CPU, and a GPU, respectively, while also achieving a lower energy-delay product. In this way, this work takes an important step toward enabling an embedded execution of emerging DAG workloads.
翻译:越来越多的应用软件,如概率机器学习、线性代数稀少、机器人导航等,展示了不规则的数据流计算,可以以定向周期图形(DAGs)为模型。这种不规则性来自节点似乎随机的连接,这使得DAG结构不适合在CPU或GPU上进行传导。此外,节点通常代表着少量的算术操作,无法对每个节点的发射任务/内核的间接费用进行摊合,进一步给平行执行带来挑战。为了能够实现节能执行,这项工作提议DAG处理单位(DPU)第二版,这是为固定连接的不正常的DAGs优化的专门处理器结构。它包括一个用于高效数据再利用的树形数据路,一个定制的银行注册文件档案,以及为支持不规则的注册访问而相互连接。 DPU-v2通过一个目标的汇编器,系统绘制与数据路径的操作,最大限度地减少银行间冲突,避免管道风险。最后,设计空间探索将最佳的架构配置定位用于耗资美元的重要的节价标准,在14年的D-AG-S-S-A-S-S-S-S-S-S-S-S-S-S-S-AD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-S-S-S-S-S-S-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-