Custom hardware accelerators for Deep Neural Networks are increasingly popular: in fact, the flexibility and performance offered by FPGAs are well-suited to the computational effort and low latency constraints required by many image recognition and natural language processing tasks. The gap between high-level Machine Learning frameworks (e.g., Tensorflow, Pytorch) and low-level hardware design in Verilog/VHDL creates a barrier to widespread adoption of FPGAs, which can be overcome with the help of High-Level Synthesis. hls4ml is a framework that translates Deep Neural Networks into annotated C++ code for High-Level Synthesis, offering a complete and user-friendly design process that has been enthusiastically adopted in physics research. We analyze the strengths and weaknesses of hls4ml, drafting a plan to enhance its core library of components in order to allow more advanced optimizations, target a wider selection of FPGAs, and support larger Neural Network models.
翻译:深神经网络的自定义硬件加速器越来越受欢迎:事实上,FPGAs所提供的灵活性和性能非常适合许多图像识别和自然语言处理任务所要求的计算努力和低潜值限制;高层次机器学习框架(如Tensorflow、Pytorch)和Verilog/VHDL的低级硬件设计之间的差距,为广泛采用FPGAs设置了障碍,在高级合成的帮助下,这种障碍是可以克服的。hls4ml是一个框架,将深神经网络转化为高级合成附加说明的C++代码,提供了一个完整和方便用户的设计过程,在物理学研究中被热情地采纳。我们分析了hls4ml的优缺点,起草了一项计划,以加强其核心组件库,以便更先进的优化,针对更广泛的FGAs选择,并支持更大的Neural网络模型。