Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN). However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the efficient implementation of BNN training. It also introduces undesirable dependence between samples within each batch. Inspired by the latest advance on Batch Normalization Free (BN-Free) training, we extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes. By plugging in and customizing techniques including adaptive gradient clipping, scale weight standardization, and specialized bottleneck block, a BN-free BNN is capable of maintaining competitive accuracy compared to its BN-based counterpart. Extensive experiments validate the effectiveness of our proposal across diverse BNN backbones and datasets. For example, after removing BNs from the state-of-the-art ReActNets, it can still be trained with our proposed methodology to achieve 92.08%, 68.34%, and 68.0% accuracy on CIFAR-10, CIFAR-100, and ImageNet respectively, with marginal performance drop (0.23%~0.44% on CIFAR and 1.40% on ImageNet). Codes and pre-trained models are available at: https://github.com/VITA-Group/BNN_NoBN.
翻译:批量正常化(BN)是一个关键的促进者,被认为是最先进的二进制神经网络(BNN)的关键。然而,BN层计算成本昂贵,通常使用非二进制参数执行,从而给有效执行BNN培训留下障碍。它也使每批样本之间产生不应有的依赖性。在最新的批量正常化(BN-免费)培训进展的启发下,我们将其框架扩大到培训BNNN, 并首次表明BN培训和推断制度可以将BN删除。通过插入和定制技术,包括适应性梯度剪裁、规模重量标准化和专门的瓶盖块,BNND层能够保持与BNN培训培训的高效实施。根据最新的批量标准化(BN-免费)培训进展,我们将其框架扩大到培训BNNNN,并首次显示BNN可以从BN培训和推断制度中消除。通过插入和定制技术,包括适应性梯度剪裁剪裁、规模重量标准化和专门的瓶装块块块块块,BNBNNNNNNNNNNN能够保持与B的竞争性准确性。广泛的试验证实我们的提案在不同的BNNNNNNNF主干网主干/R/R 10/RFAR/100前的成绩模型上。在CIFAR-BRBRBRBR 和BRBRBRBRBRBRBRBR 和BRBRBRBRBRBRBRBRBRBRBRBRFRFM 和BRBRFRFRFRFRFRFRFFRFR 上,在10 和BRFRFRFR 和BRFR 10 10 上,它们上,可以下降上。