On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it continues to gain importance as the number of cores, the heterogeneity of components, and the on-chip and off-chip bandwidth continue to grow. Decades of research on on-chip networks enabled cache-coherent shared-memory multiprocessors. However, communication fabrics that meet the needs of heterogeneous many-cores and accelerator-rich SoCs, which are not, or only partially, coherent, are a much less mature research area. In this work, we present a modular, topology-agnostic, high-performance on-chip communication platform. The platform includes components to build and link subnetworks with customizable bandwidth and concurrency properties and adheres to a state-of-the-art, industry-standard protocol. We discuss microarchitectural trade-offs and timing/area characteristics of our modules and show that they can be composed to build high-bandwidth (e.g., 2.5 GHz and 1024 bit data width) end-to-end on-chip communication fabrics (not only network switches but also DMA engines and memory controllers) with high degrees of concurrency. We design and implement a state-of-the-art ML training accelerator, where our communication fabric scales to 1024 cores on a die, providing 32 TB/s cross-sectional bandwidth at only 24 ns round-trip latency between any two cores.
翻译:芯片通信基础设施是现代芯片系统(SOCs)的核心组成部分,随着芯片和芯片外带宽度不断增长,其重要性继续增加。对芯片网络进行数十年的研究,使得芯片网络能够建立和连接可定制的带宽和通货特性的子网络,并遵守一个最新、行业标准协议。我们讨论的是我们模块的微分层交换和时间/区域特性,并显示它们只能组成高带宽(例如,2.5GHz和1024BE)之间的高频、感官-感官-感官-芯片通信平台。这个平台包括建立和连接与可定制的带宽和通货特性的子网络的组件。我们讨论的是我们模块的微分层交错和时间/区域特性,它们只能组成高带宽度(例如,2.5GHz和1024BE) 核心通信网络,在高端-M-M-M-Ciral-Creal-deal-destrual-destrual-deal-deal-deal-destrual-deal-ral-ral-ral-ral-ral-ral-ral-ral-leval-lock-lex-lex-lxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)。