The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this paper, we propose ARENA -- an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurable arrays (CGRAs) as the substrate platform, our key contribution is not only the CGRA-cluster design itself, but also the ensemble of a new architecture and programming model that enables asynchronous tasking across a cluster of reconfigurable nodes, so as to bring specialized computation to the data rather than the reverse. We presume distributed data storage without asserting any prior knowledge on the data distribution. Hardware specialization occurs at runtime when a task finds the majority of data it requires are available at the present node. In other words, we dynamically generate specialized CGRA accelerators where the data reside. The asynchronous tasking for bringing computation to data is achieved by circulating the task token, which describes the data-flow graphs to be executed for a task, among the CGRA cluster connected by a fast ring network. Evaluations on a set of HPC and data-driven applications across different domains show that ARENA can provide better parallel scalability with reduced data movement (53.9%). Compared with contemporary compute-centric parallel models, ARENA can bring on average 4.37x speedup. The synthesized CGRAs and their task-dispatchers only occupy 2.93mm^2 chip area under 45nm process technology and can run at 800MHz with on average 759.8mW power consumption. ARENA also supports the concurrent execution of multi-applications, offering ideal architectural support for future high-performance parallel computing and data analytics systems.
翻译:下一代 HPC 和 数据中心 很可能是可重新配置的和以数据为中心的 。 由于硬件专业化的趋势和数据驱动应用程序的出现, 我们建议 ARENA -- -- 一个非同步的可重新配置加速器环结构, 以作为未来 HPC 和数据中心如何相似的潜在假想 。 尽管使用粗糙的可重新配置阵列( CGRAs) 作为基平台, 我们的关键贡献不仅在于 CGRA- Group设计本身, 而且还在于新架构和编程模型的组合, 使得可以对可重新配置的节点组合进行不同步的任务。 在本节点使用粗略的可重新配置阵列阵列( CGRAs), 不仅包括 CGRA- 集群设计本身, 而且还包括新架构和编程组合模型的组合。 作为同步任务, C- 运行的直径直流数据运行状态, 通过运行的直径直流, 运行的直径直流将数据转换到直径直径的轨道。