Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA. Firstly, a delicate node computing unit (NCU) is designed to improve inference speed. Secondly, based on NCU, an efficient architecture and an adaptive dataflow are proposed, in order to alleviate the problem of node computing imbalance in the classification process. Moreover, an optimized storage scheme in this design also improves hardware utilization and power efficiency. The proposed design is implemented on an FPGA board, Intel Stratix V, and it is evaluated by two typical datasets, ADULT and Face Mask Detection. The experimental results show that the proposed design can achieve around 40x speedup compared to that on a 40 cores high performance x86 CPU.
翻译:深海森林是一种著名的机器学习算法,以其预测的高度准确性而著称。与深神经网络相比,深森林几乎没有乘法操作,在小型数据集上表现较好。然而,由于结构深和森林数量庞大,它有大量的计算和内存消耗。在本文中,为深森林模型提出了一个高效硬件加速器,这也是在FPGA上实施深森林模型的首次工作。首先,一个微妙的节点计算单位(NCU)旨在提高推断速度。第二,根据NCU,提出了高效的架构和适应性数据流,以缓解分类过程中的节点计算不平衡问题。此外,这一设计中的优化储存计划也提高了硬件的利用率和能量效率。拟议设计是在FPGA板Intel Stratix V上实施的,并且由两个典型的数据集(Adut and Facemask Servement)进行评估。实验结果表明,拟议的设计可以达到40x速度,而40个核心的高级性能x86 CPU。