Most of the previous works on data flow optimizations for Machine Learning hardware accelerators try to find algorithmic re-factorization such as loop-reordering and loop-tiling. However, the analysis and information they provide are still at very high level and one must further map them onto instructions that hardware can understand. This paper presents "Dijkstra-Through-Time" (DTT), an ahead of time compute and memory scheduling-mapping algorithm for deterministic workloads. It provides a simple implementation and supports accelerators with complex NoC configurations, at the expense of a long compilation process. This initial paper illustrates a proof of concept implementation to merge scheduling and data cache coherence mechanisms to get more optimized data flows.
翻译:以往关于机器学习硬件加速器的数据流优化的大部分工作都试图找到逻辑再因子,如循环重新排序和循环调节。 但是,它们所提供的分析和信息仍然处于很高的水平,必须进一步将其映射到硬件能够理解的指示上。本文展示了“Dijkstra-Trough-Time” (DTT), 提前计算和存储时间列表- 映射确定性工作量的算法。 它提供了简单的实施, 支持具有复杂 NOC 配置的加速器, 以牺牲一个漫长的汇编过程。 这份初步文件展示了合并计划安排和数据缓存一致性机制的概念实施证据, 以获得更优化的数据流 。