Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.
翻译:在FPGA上实现低延迟、资源高效的神经网络推理对于需要实时能力和低功耗的应用至关重要。基于查找表(LUT)的神经网络是一种常见解决方案,兼具强大的表示能力和高效的FPGA实现。本文提出KANELÉ框架,该框架利用Kolmogorov-Arnold网络(KANs)的独特特性进行FPGA部署。与传统的多层感知机(MLPs)不同,KANs采用具有固定域的可学习一维样条函数作为边缘激活函数,这种结构天然适合离散化和高效的LUT映射。我们首次提出了在FPGA上实现KANs的系统化设计流程,通过量化与剪枝协同优化训练,以实现紧凑、高吞吐量和低延迟的KAN架构。实验结果表明,相较于先前的KAN-on-FPGA方案,本框架最高可实现2700倍的加速,并节省数个数量级的资源。此外,在广泛使用的基准测试中,KANELÉ达到或超越了其他基于LUT的架构,尤其在涉及符号或物理公式的任务中表现突出,同时平衡了FPGA硬件资源的使用。最后,我们通过将该框架扩展至实时、高能效的控制系统,展示了其多功能性。