在 FPGA 上为运行时间学习执行灵活 HLS 跳动树形执行功能 (A Flexible HLS Hoeffding Tree Implementation for Runtime Learning on FPGA)

Decision trees are often preferred when implementing Machine Learning in embedded systems for their simplicity and scalability. Hoeffding Trees are a type of Decision Trees that take advantage of the Hoeffding Bound to allow them to learn patterns in data without having to continuously store the data samples for future reprocessing. This makes them especially suitable for deployment on embedded devices. In this work we highlight the features of an HLS implementation of the Hoeffding Tree. The implementation parameters include the feature size of the samples (D), the number of output classes (K), and the maximum number of nodes to which the tree is allowed to grow (Nd). We target a Xilinx MPSoC ZCU102, and evaluate: the design's resource requirements and clock frequency for different numbers of classes and feature size, the execution time on several synthetic datasets of varying sample sizes (N), number of output classes and the execution time and accuracy for two datasets from UCI. For a problem size of D3, K5, and N40000, a single decision tree operating at 103MHz is capable of 8.3x faster inference than the 1.2GHz ARM Cortex-A53 core. Compared to a reference implementation of the Hoeffding tree, we achieve comparable classification accuracy for the UCI datasets.

翻译：在嵌入系统中实施机械学习系统时,往往偏好决策树,因为机械学习的简单性和可缩放性。动画树是一种决策树,它利用Hoffding Bound 来学习数据模式,而不必持续存储数据样本,从而使其能够在将来的再处理中学习数据模式。这使得它们特别适合安装在嵌入设备上。在这项工作中,我们突出了HLS实施Heffding 树的特性。执行参数包括样品的特性大小(D)、产出类别(K)的数量(K)以及允许树生长的最大节点数量(Nd)。我们瞄准了Xilinx MPSoC ZCUC102,并评估了:不同种类和特征大小的设计资源要求和时钟频率、不同样本大小的若干合成数据集的执行时间(N)、产出类别的数目以及UCI的两个数据集的执行时间和准确性。对于问题大小为D3、K5和N40000,103MHz的单一决定树操作速度比我们1.2GHARM-A核心的精确度的测量比我们可比较的1.15A核心数据分类的精确度要快8.3x。