We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations.
翻译:我们提出了一个设计无倍数内核机器的新框架,可用于智能边缘装置等资源受限制的平台。框架使用基于边距传播技术的片断线性近似(PWL)近似(PWL),仅使用增/减法、转换、比较和内流/流流流操作登记。我们提议了一种基于基于软硬件的基于MP的推论和在线培训算法,该算法已经优化用于外地可编程门阵列平台。我们的FPGA实施消除了对DSP单元的需求并减少了LUT的数量。通过重新使用相同的用于推断和培训的硬件,我们表明该平台能够克服因MP近似而出现的分类错误和本地微型工艺。在FPGA上安装的这一拟议的无倍式MP内核机器,其估计耗能为13.4pJ,电耗为107 mW,每耗能为~9k LUT和FF,每耗电量为256x32个大小内核内核,使其在权力、性能和面积方面优于其他类似执行。