In the low-bit quantization field, training Binary Neural Networks (BNNs) is the extreme solution to ease the deployment of deep models on resource-constrained devices, having the lowest storage cost and significantly cheaper bit-wise operations compared to 32-bit floating-point counterparts. In this paper, we introduce Sub-bit Neural Networks (SNNs), a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are inspired by an empirical observation, showing that binary kernels learnt at convolutional layers of a BNN model are likely to be distributed over kernel subsets. As a result, unlike existing methods that binarize weights one by one, SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Specifically, our method includes a random sampling step generating layer-specific subsets of the kernel space, and a refinement step learning to adjust these subsets of binary kernels via optimization. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs. For instance, on ImageNet, SNNs of ResNet-18/ResNet-34 with 0.56-bit weights achieve 3.13/3.33 times runtime speed-up and 1.8 times compression over conventional BNNs with moderate drops in recognition accuracy. Promising results are also obtained when applying SNNs to binarize both weights and activations. Our code is available at https://github.com/yikaiw/SNN.
翻译:在低位夸度字段中,培训二进制神经网络(BNN)是便利在资源限制装置上部署深模型的最极端解决办法,其存储成本最低,比32位浮点对等点的比重低得多。在本文中,我们引入了亚位神经网络(SNN),这是为压缩和加速BNN而定制的一种新型双进制量化设计。SNNN受到经验性观察的启发,表明在BNN模型合集层中学习的二进制内核内核可能会在内核子子子子中分布。因此,SNNNNNNCS的现有重量比值和S-18的S-NDRS S-RODR 的升级结果不同,SNFA在S的S-S-NDRS-S-S-S-Rimalnialalal Registrations 时,S-S-SDISDR 将S-S-IDR 的S-IDS-S-Simalnial Realial regial regial regial regial regial regial regial regal real real sal real realisalismismisalisalislationalismismismismation)。