LL-GNN: 用于粒子探测器的FPGAs低纬度图神经网络 (LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors)

This work proposes a novel reconfigurable architecture for low latency Graph Neural Network (GNN) design specifically for particle detectors. Accelerating GNNs for particle detectors is challenging since it requires sub-microsecond latency to deploy the networks for online event selection in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a custom code transformation with strength reduction for the matrix multiplication operations in the interaction-network based GNNs with fully connected graphs, which avoids the costly multiplication. It exploits sparsity patterns as well as binary adjacency matrices, and avoids irregular memory access, leading to a reduction in latency and improvement in hardware efficiency. In addition, we introduce an outer-product based matrix multiplication approach which is enhanced by the strength reduction for low latency design. Also, a fusion step is introduced to further reduce the design latency. Furthermore, an GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under a given latency constraint. Finally, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 24 times faster and consumes up to 45 times less power than a GPU implementation. Compared to our previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy.

翻译：这项工作建议为低悬浮图像神经网络(GNN)专门设计用于粒子探测器的新型重构结构。加速粒子探测器 GNNS 具有挑战性, 因为它需要低微秒的延迟度, 在 CERN 大型 Hadron 相撞器实验中部署一级触发的在线事件选择网络。本文建议为基于互动网络的GNNS 的矩阵倍增操作进行定制代码转换, 减少强度, 并配有完全连通的图形, 避免成本倍增。它利用了宽度模式以及双相匹配矩阵, 避免了不规则的内存访问, 导致延度下降和硬件效率的提高。此外, 我们引入了基于外产产品的基矩阵倍倍增法, 从而进一步降低设计。此外, GNNNE 特定的算法- 硬体软件的二次共置换式组合方法, 不仅能从低度设计到低度, 也避免了6级的内装质访问, 降低内存度的内存率, 降低内存时间的内存率, 调调调调调调调调调调调调时间, 调的GGGGGDRBRDRDRDRDRDRDRD, 也使得高调调能显示高调的GGFP RDFD RD RD RD RD RD RD 。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日