In recent years, transformer-based models have shown state-of-the-art results for Natural Language Processing (NLP). In particular, the introduction of the BERT language model brought with it breakthroughs in tasks such as question answering and natural language inference, advancing applications that allow humans to interact naturally with embedded devices. FPGA-based overlay processors have been shown as effective solutions for edge image and video processing applications, which mostly rely on low precision linear matrix operations. In contrast, transformer-based NLP techniques employ a variety of higher precision nonlinear operations with significantly higher frequency. We present NPE, an FPGA-based overlay processor that can efficiently execute a variety of NLP models. NPE offers software-like programmability to the end user and, unlike FPGA designs that implement specialized accelerators for each nonlinear function, can be upgraded for future NLP models without requiring reconfiguration. We demonstrate that NPE can meet real-time conversational AI latency targets for the BERT language model with $4\times$ lower power than CPUs and $6\times$ lower power than GPUs. We also show NPE uses $3\times$ fewer FPGA resources relative to comparable BERT network-specific accelerators in the literature. NPE provides a cost-effective and power-efficient FPGA-based solution for Natural Language Processing at the edge.
翻译:近年来,以变压器为基础的超额处理模型显示自然语言处理(NLP)最先进的结果。特别是,引入了BERT语言模型,在诸如问答和自然语言推断等任务中取得了突破,推动了使人类与嵌入设备自然互动的应用程序。基于FPGA的重叠处理器被显示为边缘图像和视频处理应用的有效解决方案,这些应用大多依靠低精度线性矩阵操作。相比之下,基于变压器的NLPE技术采用各种更精密的非线性操作,频率要高得多。我们介绍了以FPGA为基础的超额处理器,它能够有效地执行各种NLP模式。 NPE向终端用户提供软件类程序式程序,与FPA为每个非线性功能实施专门加速器的设计不同,可以升级为未来的NLPP模式,而无需重组。我们证明,NPEPE能够满足BER语言模型的实时全线性高超标值非线性非线性操作。我们介绍NPEPE的电源比CPU值低,并且在GPE-PE-PE-PE 相对性平调的GPE 提供比GPEFPE的低PEPE的平价的平价的平价性平价的平价的平价的平价的平价的平价的平价。