In this paper, we consider the HLS implementation of a three-dimensional systolic array architecture for matrix multiplication that targets specific characteristics of Intel Stratix 10 FPGAs in order to produce designs that achieve a high floating-point throughput using most of the DSPs at high frequencies in a way that avoids the congestion of the routing fabric. The investigated three-dimensional systolic array architecture is able to produce hardware designs that use 99% of the available DSPs with maximum frequencies that let us achieve performances above 3 TFLOPS.
翻译:在本文中,我们考虑了HLS对矩阵乘法三维系统阵列结构的实施,该阵列结构针对英特尔·斯特拉提克斯 10 FPGAs的具体特性,目的是制作能够达到高浮点输送量的设计,在高频率使用大多数DSPs,以避免路由结构堵塞。所调查的三维系统阵列结构能够产生硬件设计,其使用99%的现有DSPs,其最大频率使我们能够达到3 TFLOPS以上性能的功能。