Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61%.
翻译:跨核心通信日益成为瓶颈,因为处理元素的数量会增加每个系统-芯片的处理元素数量。跨核心通信的典型硬件解决方案往往不灵活;软件解决方案虽然灵活,但有绩效缩放限制。我们将表明,一个关键问题是基于软件的信息队列机制中的共享状态。本文提出虚拟链接(VL),这是一个具有硬件支持的新颖的轻量通信机制,有硬件支持,有利于M:N无锁数据移动。VL将一致共享状态的数量减少到零。VL通过将数据保存在快速路径上(即在芯片互联中)提供进一步的延缓效益。VL能够让个人在基于软件的连接中进行定向缓冲(缓冲),降低核心-核心通信的惯性。VL对于跟踪数据的细微任务特别有效。对7个基准的全系统模拟器的评估显示,VL在降低状态-节流读软件的通信机制的同时,实现了2.09x速度超过状态-61 %的存储器通信机制。