In recent years, machine learning methods became increasingly important for a manifold number of applications. However, they often suffer from high computational requirements impairing their efficient use in real-time systems, even when employing dedicated hardware accelerators. Ensemble learning methods are especially suitable for hardware acceleration since they can be constructed from individual learners of low complexity and thus offer large parallelization potential. For classification, the outputs of these learners are typically combined by majority voting, which often represents the bottleneck of a hardware accelerator for ensemble inference. In this work, we present a novel architecture that allows obtaining a majority decision in a number of clock cycles that is logarithmic in the number of inputs. We show, that for the example application of handwritten digit recognition a random forest processing engine employing this majority decision architecture implemented on an FPGA allows the classification of more than seven million images per second.
翻译:近年来,机器学习方法对多种应用越来越重要,然而,它们往往受到高计算要求的影响,损害其在实时系统中的高效使用,即使在使用专用硬件加速器时也是如此。聚合学习方法特别适合硬件加速,因为可以由低复杂性的个别学习者来制造,从而提供巨大的平行潜力。关于分类,这些学习者的产出通常由多数表决组合,这往往代表了硬件加速器的瓶颈,从而产生共通推论。在这项工作中,我们提出了一个新的结构,允许在一些在投入数量上具有对数的时钟周期中获得多数决定。我们表明,举例来说,使用在FPGA上执行的这一多数决定结构的随机森林处理引擎可以对每秒700多万图象进行分类。