This paper introduces NeuroBlend, a novel neural network architecture featuring a unique building block known as the Blend module. This module incorporates binary and fixed-point convolutions in its main and skip paths, respectively. There is a judicious deployment of batch normalizations on both main and skip paths inside the Blend module and in between consecutive Blend modules. Additionally, we present a compiler and hardware architecture designed to map NeuroBlend models onto FPGA devices, aiming to minimize inference latency while maintaining high accuracy. Our NeuroBlend-20 (NeuroBlend-18) model, derived from ResNet-20 (ResNet-18) trained on CIFAR-10 (CIFAR-100), achieves 88.0\% (73.73\%) classification accuracy, outperforming state-of-the-art binary neural networks by 0.8\% (1.33\%), with an inference time of 0.38ms per image, 1.4x faster than previous FPGA implementation for BNNs. Similarly, our BlendMixer model for CIFAR-10 attains 90.6\% accuracy(1.59\% less than full precision MLPMixer), with a 3.5x reduction in model size compared to full precision MLPMixer. Furthermore, leveraging DSP blocks for 48-bit bitwise logic operations enables low-power FPGA implementation, yielding a 2.5x reduction in power consumption.
翻译:暂无翻译